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Preface 


In this Olympic year it is difficult to find anything in Australia that has not been 
affected, and that includes AUUG. Our annual conference is normally held during 
early Spring, despite the title being Winter Conference. However, this year that time 
has been usurped by the Sydney Olympics and so the decision was made to move 
the Conference to June, for this year only. Of course, to really get the feeling of a 
winter conference, the decision was made to hold it in Canberra, a place not known 
for warm winters. 

While the Olympics may be seen as an opportunity to show off Australia’s prowess to 
the world, AUUG2K has proved to be an opportunity to demonstrate our abilities in 
the Open Systems world. Normally AUUG invites a number of overseas speakers to 
the Conference, but the continued spread of the Internet, and the quality of Australian 
developers means that interesting work is being carried out right here. 

The conference theme of Enterprise Security. Enterprise Linux is reflected in the 
layout of the program with one day being given to Security issues, one to Linux, and 
reflection our continuing support of Open Systems, a day on Open Systems. In all 
these areas we have brought together the leaders in the field to outline where things 
are today, and where they are moving in the future. 

Looking through the program you will see major players in many of the separate 
branches of Linux, *BSD and many other Open Source projects. As well, you will 
also see practitioners within the fields of security, implementation and integration are 
presenting case studies of their work. 

Another important area of the program is the cooperation between the various 
computer societies within Australia. Again this year, the Internet Society of Australia 
(ISOC-AU) has contributed a session, and for the first time, the President of 
Electronic Frontiers Australia is presenting a paper on the issues they see facing the 
future. 

Finally, I would like to acknowledge the support of the major Linux vendors in the 
support of AUUG2K. You will see from the program that we have major contributions 
from Red Hat, TurboLinux, LinuxCare and SuSE. Without their support we would 
have a much smaller conference. 

So, the Program Committee and myself would like to commend to you this 
conference program and hope that you enjoy and learn from it. 


Frank Crawford 

Programme Chair 
AUUG2K 
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The assistance of the Australian Nuclear Science and 
Technology Organisation and its staff in the production of these 
Proceedings is greatly appreciated. 
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The Art of Keeping Secrets 

OR 

Aspects of Good Information Security Policy 

Michael Paddon 
eSec Limited 

www.esec.com.au 
mwp@esec. com.au 

Abstract 

A strong information security policy is a foundation for success¬ 
ful and sustainable security outcomes. Too many organisations 
treat online security as a purely technical challenge, relying on ad 
hoc measures to drive planning and behaviour. Inevitably, this cre¬ 
ates security “shear” across the enterprise, giving rise to points of 
weakness which invite attack and often yield to compromise. 

A security policy that is to be both effective and embraced within 
an organisation is a necessary but elusive goal. This paper presents 
several useful structures and techniques that have been effective 
in engineering workable policies for hostile environments. 

1 Strategic Overview 

The one who figures on victory at headquarters before even 
doing battle is the one who has the most strategic factors on 
his side. 

-Sun Tzu, The Art of War 

Most organisations have secrets, and every enterprise has informa¬ 
tion assets that it desires to protect from improper access or manipu¬ 
lation. The likelihood and severity of a compromise depends primarily 
on the effectiveness and consistency of the risk management strategy 
pursued. The purpose of an information security policy is to provide an 
articulation of, and a framework for, that strategy. The framework may 
then be used to drive planning, process, activity and review. 

Historically, enterprises have secured their information assets on an 
ad hoc basis, generally relying on physical security to prevent compro¬ 
mise. No distinction has been made between information and property 
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of a more tangible form. This model has the great advantage that phys¬ 
ical security is generally well understood and relatively simple to im¬ 
plement, however it breaks down when there are non physical paths by 
which assets may be attacked. 

The advent of the Internet has provided a multitude of such paths. 
It is now considered unusual for an organisation not to be connected 
to the ubiquitous global network. Usually these connections are made 
and managed according to a technical or commercial agenda, without a 
broad understanding that the link creates a completely new landscape 
of threat and risk. Sadly, the result is often massive compromise. 

This paper is written with the Internet very much in mind. Hence, 
while information security traditionally includes issues such as disaster 
recovery, our focus will be protection from hostile agents attacking from 
both within and without. 

2 Elements of Policy 

Assess the advantages in taking advice, then structure your 
forces accordingly... Forces are to be structured strategically, 
based on what is advantageous. 

- Sun Tzu, The Art of War 

Writing an information security policy is hard. As a consequence it 
is common to find organisations that do not have one. If it does exist, 
often it is inappropriate, inadequate, out of date, or simply ignored in 
practice. Where there is no policy, strategy is diffuse and tactics have 
unsustainable outcomes. A policy that is not broadly understood, ap¬ 
plied and of use in day to day risk management is even worse: it creates 
a false sense of security. 

Furthermore, there is no such thing as a universal security policy. 
Each organisation is different, with unique assets and threats. At¬ 
tempts to fit a “one size fits all” policy generally yield poor outcomes: 
like a poor fitting shoe, the results can be both painful and permanently 
damaging. 

The experience of others, however, can be viably used to assist in the 
creation of a policy suited to your specific needs. This paper presents a 
number of techniques and elements that the author has found useful in 
building infosec policies. This is by no means an exhaustive tutorial on 
how to write your own policy, but it is hoped that it will serve as a useful 
guide along the way. 

You are encouraged to select ideas and techniques from this paper 
that apply to your circumstances, and to discard the remainder. 
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2.1 Acceptable Security Outcomes 

Therefore victory in war is not repetitious, but adapts its form 
endlessly. 


- Sun Tzu, The Art of War 

It is not uncommon to hear a layman speak of security as if it were 
a binary value: you are either secure or you are not. This can lead to 
the assumption that a risk elimination strategy is required, which is 
generally unworkable in the real world. You don’t need perfection, and 
you certainly don’t have to pay for it, so long as you achieve acceptable 
security outcomes on a sustainable basis. 

Security is a continuum, with outcomes based on investment and 
tradeoffs. Simply put: 

cost = security x function 

The cost of any solution is a product of the desired levels of security 
and function. Very cheap security is possible in the presence of simple 
functionality. Conversely, making a complex system secure often proves 
quite expensive. 

By clearly articulating your acceptable security outcomes, you can 
reduce the problem space to a cost versus function tradeoff. Further¬ 
more, choosing a realistic level of acceptable outcome, gives you imple¬ 
mentation flexibility. 

2.2 Risk Management 

Act after having made assessments. The one who first knows 
the measures of far and near wins - this is the rule of armed 
struggle. 


- Sun Tzu, The Art of War 

The backbone of your security policy is your risk management pro¬ 
cess, the purpose of which is to reduce the likelihood to asset compro¬ 
mise. A compromise has occurred when an asset is subject to unautho¬ 
rised: 

• Access, copying, modification or destruction. 

• Reservation or utilisation. 

• Denial of access. 

A good risk management process should include the following steps: 
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1. Asset Identification: The information assets that you are pro¬ 
tecting must be clearly identified. It is important not to confuse 
the container of information, be it person, computer or filing cabi¬ 
net, with the asset itself. Each asset should be assigned an overall 
value, based on the following components: 

• Price: How much will it cost to replace? 

• Sensitivity: What is the cost of a compromise (this can far 
exceed the price, in certain circumstances)? 

• Desirability: How much is it worth to others? 

2. Threat Assessment: Assets are subject to distinct threats, each 
which must be clearly assessed. Each threat should be assigned 
an overall severity, based on the following components: 

• Motivation: What is the purpose of the attackers? How deter¬ 
mined are they? 

• Capability: How skilled are the attackers? 

• Resourcing: How well are the attackers resourced? How much 
time and money are they willing to spend? 

• Probability: What is the probability of the asset being at¬ 
tacked? 

• Frequency: How often will the asset be attacked? How many 
attackers are there? 

• Deterrent: What are the consequences to an attacker of a 
failed attempt? 

3. Evaluate Risk: The overall risk to an asset is a function of its 
value and its threat severity. While it is possible to create ar¬ 
bitrarily complex numerical models, a coarse but simple system 
often provides adequate resolution. 

In the author’s experience, a simple three point scale suffices for 
most scenarios. If asset values and threat severities are rated as 
low, medium or high, then risk may be calculated from a simple 
table: 



threat severity 
low medium high 

low 

asset value medium 
high 

low medium medium 

medium medium high 

medium high high 


You can weight your risk assessments, from laissez faire to para¬ 
noid, by populating your risk matrix differently. 
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4. Mitigate Risk: Armed with a risk evaluation, it is straightfor¬ 
ward to decide whether or not the asset currently meets your out¬ 
come criteria. For instance, anything other than a low risk might 
be defined as unacceptable. 

If the risk is unacceptable, a mitigation strategy must be defined. 
Adequate protective mechanisms must be deployed to reduce the 
risk to an appropriate level. This may require several iterations of 
the risk management process. 

2.3 Information Warfare 

Military action is important ... it is the ground of death and 
life, the path of survival and destruction, so it is imperative 
to examine it. 


- Sun Tzu, The Art of War 

The Internet provides a plethora of extreme severity threats. As 
such, it is an ongoing theatre of information warfare rather than a sin¬ 
gle threat in itself. Any organisation exposed to this environment must 
shape its security policy to address this fact; often the Internet should 
be allowed to dominate all other security issues. 

Exposure does not necessarily require direct connectivity. Email, re¬ 
movable media and indirect or occasional interconnect are all potential 
vectors of compromise. 

The Internet is characterised by: 

• Rapid evolution of threats. 

• A large population of attackers and diverse threats. 

• Rapid promulgation of novel attack techniques. 

• A large and increasing number of vulnerabilities. 

• Software and hardware components that cannot effectively be hard¬ 
ened against attack. 

The sane approach to this environment would simply be not to connect, 
or to vigorously quarantine exposure. Commercial and technical pres¬ 
sures, however, are driving most organisations in the opposite direction. 

If information warfare cannot be avoided, it is essential to under¬ 
stand the enemy. Thinking like an attacker is the most potent tool you 
have in ensuring that your risk management strategy is sufficient. Typ¬ 
ical attacker profiles are: 
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• Tourists: high capability, low resources, generally motivated by 
curiosity and the desire to learn. 

• Artists: high capability, low resources, generally motivated by chal¬ 
lenge and kudos. 

• Vandals: high capability, low resources, generally motivated by a 
desire for notoriety, and perhaps a political agenda. 

• Spies: high capability, high resources, generally motivated by in¬ 
dustrial espionage and sabotage. 

• Thieves: medium capability, low resources, generally motivated by 
criminal intent. 

• Script kiddies: low capability, low resources, high population, gen¬ 
erally motivated by entertainment and using “pre-canned” exploits. 

Sophisticated threats will often masquerade as a less severe attack. 

As you write your policy, remember that attacks occur from within 
an organisation, as well as from outside. In particular, internal spies 
and thieves are not uncommon in the commercial context. 

In some ways, Internet connectivity simplifies risk management, 
since threat assessment always yields high ratings. Unfortunately, this 
also implies that a non-trivial mitigation plan is required for every as¬ 
set, making per asset risk management unworkable in practice. 

2.4 Classification 

Therefore those skilled in military operations achieve cooper¬ 
ation in a group so that directing the group is like directing a 
single individual with no other choice. 

- Sun Tzu, The Art of War 

By grouping assets into classes based on risk, we can avoid the need 
to deal with them on a case by case basis. Each class may then be 
associated with generic handling rules, which provide for an acceptable 
security outcome for that risk profile. The result is a dramatic reduction 
in the effort required for effective risk analysis and mitigation planning. 

In the author’s experience, a small number of classes is sufficient 
to meet the needs of most commercial companies. One example fol¬ 
lows, although policy writers should feel free to tailor their classification 
scheme to their needs. 
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Classification 

Colour Code 

Description 

Level 0 (Public) 

Green 

Information which may be 
generally published. 

Level 1 (Sensitive) 

Blue 

Compromise may cause mi¬ 
nor and short term damage 
to the business. 

Level 2 (Confidential) 

Yellow 

Compromise may cause se¬ 
rious and short term dam¬ 
age to the business. 

Level 3 (Secret) 

Red 

Compromise may cause ma¬ 
jor or long term damage to 
the business. 

Level 4 (Need to Know) 

Black 

Compromise may cause crit¬ 
ical damage to the business. 


In this scheme, colours have also been assigned to various levels to as¬ 
sist in the clear labeling of assets. 

The key elements of a good classification scheme are simplicity, ease 
of use and clarity. You should only have as many levels as you actually 
need. Whatever classification scheme is chosen, it should be applied 
widely and consistently across the organisation. 

2.4.1 Personnel Clearance 

One obvious application of a classification scheme is access control. Each 
person in the organisation is assigned a clearance, which grants access 
to all assets of equal or lesser classification. 

In our example, a Level 3 clearance would permit access to all assets 
classified at Level 3 or below, but not to Level 4. 

One useful variation is the introduction of compartments, which par¬ 
tition assets into “need to know” groups. Clearances are assigned inde¬ 
pendently for each group, and a person’s clearance must be both high 
enough and match the compartment to gain access. 

In our example. Level 4 is compartmentalised. All assets at this 
level must be assigned a compartment, for instance “Level 4, Payroll”. 
Personnel cleared for “Level 4, Payroll” will be granted access, whereas 
“Level 4, Sysadmin” does not suffice. 

2.4.2 Handling Rules 

Classification obviates the need for risk mitigation on each asset. Gen¬ 
eral handling rules for each classification will meet the needs of most 
scenarios, with additional rules occasionally specified for unusual as¬ 
sets. 
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The concept of a container is a major tool for specifying handling 
rules. A container is simply defined as anything which holds, encom¬ 
passes or carries an asset. A container may be physical, such as a com¬ 
puter, a network, a filing cabinet or a person. It may also be purely 
logical, such as a file system or an encrypted tunnel. 

The important attribute of a container is that it can enforce access 
control to some degree. A computer needs a password, a filing cabinet 
needs a key, and a network needs connectivity. People are particularly 
good at implementing complex access control algorithms, but may be 
less stringent and easier to confuse than devices. 

Containers also nest. A building encapsulates a LAN, which encap¬ 
sulates a computer, which encapsulates a file system. Therefore an as¬ 
set is protected by all the surrounding containers, the strongest of which 
defines the overall protection provided. 

If containers are classified according to the same scheme as assets 
and personnel, then a surprisingly small number of generic handling 
rules can be used to address most situations. A typical set might be: 

1. Assets must always be encapsulated by a container of equal or 
greater classification. This ensures adequate protection. 

2. Assets and containers must be audited before being placed inside 
a container of higher classification. This prevents “trojan horse” 
compromise. 

3. Containers must only be accessed by personnel of sufficient clear¬ 
ance, including matching compartments where necessary. 

In this case, three generic rules are sufficient to describe a highly 
flexible, easy to use regime. Most importantly, the rules are memorable 
and general and, therefore, are likely to be retained and consistently 
applied by personnel. 

It may also be necessary to provide specific rules for particular clas¬ 
sification levels. For instance: 

1. Assets at Level 3 or higher must be protected by public key au¬ 
thentication. 

2. Assets at Level 4 must not be accessible remotely. 
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2.5 Compromise Management 

So it is said that if you know othei's and know yourself you 
will not be imperiled in a hundred battles; if you do not know 
others but know yourself you win one and lose one; if you do 
not know others and do not know yourself you will be imper¬ 
iled in every single battle. 


-Sun Tzu, The Art of War 

Your security policy should provide for compromise management. 
It is far too late for an organisation to begin planning appropriate re¬ 
sponses once a breach has occurred. By that stage, panic has often set 
in, and good decisions are uncommon. 

The key aspects of compromise management are: 

• The creation of detailed response plans a priori to compromise. 

• Guidelines for assigning severities to suspected and confirmed com¬ 
promises. 

• A general model of a compromise lifecycle, including escalation 
strategies. 

• Requirements for compromise post mortem and closure. 

2.6 Implementation 

Invincibility is a matter of defense, vulnerability is a matter 
of attack. 


- Sun Tzu, The Art of War 

Successful implementation requires that a security policy address 
the following issues: 

1. The goals and importance of the policy must be clearly articulated. 

2. The origin and degree of authority carried by the policy must be 
clear. Ideally, it should be adopted and empowered at the highest 
levels of your organisation. 

3. Whatever roles and responsibilities are required for implementa¬ 
tion must be defined. The role of a Security Officer, responsible 
for day to day implementation of the policy, is particularly crucial, 
and is recommended as a mandatory inclusion. 

4. Responsibility for observing policy should be devolved to each and 
every member of the organisation. 
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5. Remedies for non-compliance must be provided for. At the same 
time, don’t try to make your policy a legal document, and don’t try 
to create a police force. 

6. Delegate the specification of specific processes and practices to pro¬ 
cedural documents. 

7. Your policy is a living document. Define a change control mecha¬ 
nism that is flexible enough to allow it to evolve, but not so flexible 
that trivial changes are likely. 

Very little has been said about technology in this paper. This is in¬ 
tentional... technology is a tactical tool that should be used to imple¬ 
ment your policy, not to define it. Consciously factoring out technology 
wherever possible highlights the deeper structure of your policy and en¬ 
sures that it is as widely applicable as possible. 


3 Security Shear 

Those who use arms well cultivate the Way and keep the rules. 
Thus they can govern in such a way as to prevail over the 
corrupt. 


- Sun Tzu, The Art of War 

At the end of the day, the best security policy will come to naught if 
it does not promote and reinforce the correct day to day practices. At its 
heart, effective security is all about the behaviour of your people. 

If the behaviours are correct your security outcomes will be satisfac¬ 
tory, even though errors occur. This is because correct behaviour cre¬ 
ates layers of self reinforcing protection, so that an inadvertent lapse 
or mishap is unimportant. Conversely, without the correct behaviours, 
no amount of policy, process or technology will achieve the desired goals 
because you will continually undermine your own efforts. 

Security shear occurs wherever there is a discontinuity between pol¬ 
icy and practice. Typical examples are where security is left to the “man¬ 
agers” or to the “technical team”. The parts of your organisation that 
are not bound by your policy will develop divergent behaviours, almost 
always at cross purposes to your security strategy. 

These points of discontinuity, where opposing priorities play tug-o- 
war, are the weakest points of your armour. Attackers will take advan¬ 
tage of the uncontrolled interface between secure and insecure culture 
to compromise assets that they wouldn’t otherwise have any chance of 
reaching. 
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The remedy is to ensure that your security policy is adopted at every 
level of your organisation, and by every single person. There are several 
things you can do to promote this: 

• Make the policy easy to understand, remember and use. This 
means it must be short. 

• Make the security outcomes relevant. 

• Train your people in how the policy works and how to apply it. Run 
regular refresher courses and ensure it is a part of new employee 
orientation. 

• Ensure that senior management leads by example. Your policy 
has no chance if there is an aristocracy who believe themselves 
exempt. 

• Provide positive feedback for correct behaviours as well as nega¬ 
tive feedback. 


4 Conclusion 

Making the armies able to take on opponents without being 
defeated is a matter of unorthodox and orthodox methods. 

- Sun Tzu, The Art of War 

Every security policy must be unique, following the philosophy, logic 
and needs of its organisation. The path to successful outcomes begins 
by building your policy. 

Despite the difficulty of this endeavour, there is significant prior art 
to be drawn on which can make the task easier. Use these resources, 
rather than trying to recreate them from first principles, ensuring al¬ 
ways that you are picking and choosing what is right for you rather 
than enslaving yourself to a foreign framework. A good security pol¬ 
icy may contain all the ideas presented in this paper, or it may contain 
none. 

Regardless of what you take from this paper, persevere with your 
policy. Don’t expect to get it right on the first try, and don’t expect imple¬ 
mentation to be easy. Sustained behavioural change is always difficult, 
and the bigger the enterprise, the greater the challenge. 
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Igloo, the eSec Electronic Commerce Framework 
Craig Smith, Jesper Peterson, Allison Foster 

eSec Limited 


Abstract 

Igloo, the eSec Electronic Commerce Framework is a design model and software development toolkit 
that can be used to create electronic services and the electronic marketplaces in which such services 
may be deployed. 

The framework specifies the rules of communication and interaction between software agents, across a 
distributed network, based on the way people interact in the business world. Agents, designed and 
developed to use the communication rules specified in the framework, can set up ongoing productive 
dialogues with other agents in order to complete a task. 

The framework defines the types of messages that agents may exchange and the meanings or intent of 
these messages, but not content of the message, or how such a message is encoded. 
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Introduction 

Software developers and integrators are often asked to build massive business-to-business (B2B) 
applications in impossibly short time frames because of the need to get Internet products to the market 
rapidly. The Igloo Framework is pitched squarely at these projects. It can typically cut timeframes by 
two thirds and provide similar cost savings. 

Our intention, in developing this framework, is to allow easier and faster development of open and 
secure electronic commerce solutions. By emphasising human-focused design over technology, 
developers and integrators are free to concentrate on solving business issues. The framework allows for 
rapid development of lightweight, loosely coupled business to business solutions. 

What is Igloo? 

The Igloo framework specifies the rules of communication and interaction between agents, across a 
distributed network, based on the way people interact in the business world. Agents, designed and 
developed to use the communication rules specified in the framework, can set up ongoing productive 
dialogues with other agents in order to complete a task. Business rules may also be incorporated into 
agents so that they can act as an interface between existing systems and the outside world. 

The framework defines the types of messages that agents may exchange and the meanings or intent of 
these messages, but not the content of the message, or how such a message is encoded. In effect we 
have defined a simple language which agents use to communicate with one another. 

Why use Igloo? 

Electronic marketplaces are dynamic systems that involve a large number of participants and change 
constantly as corporate entities enter and leave. The nature of the goods traded in a marketplace is 
expected to evolve as the capabilities of the computer systems connected to the marketplace grow. The 
dynamic nature of electronic services places great demands on the implementation technology chosen 
to build them. 

These marketplace’s exhibit unique challenges due to factors such as: 

• The dynamic nature of electronic marketplaces 

• Distribution over a network 

• Concurrency 

• Lack of common goals 

• Integration Technology. 

Current marketplaces are moving toward agent-based approaches, by defining interchange formats such 
as XML, which autonomous entities exchange. Systems are being glued together using devices such as 
web servers. We aim to push these systems to the next logical step, adopting a multi-agent approach 
that will enable the development of the third generation of commerce on the Internet. 

Igloo is strongly predicated around the notion of software agents: small, autonomous, flexible and 
adaptive pieces of software that represents an organisation online. The agent-based architecture of 
Igloo provides the following benefits: 

• Simplifies complex systems by breaking them into parts. 

• Incremental deployment for faster turnarounds and lower costs. 

• Human focused, rather than technology driven, design. 

• Application of modern artificial intelligence techniques. 

• Open and synergistic systems. 
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The framework provides the key functionality that any next generation business to business application 
requires: 


• Extensible message formats. 

• Distributed, loosely coupled architecture, based on message exchanges. 

• Strong encryption and authentication, using recognised algorithms. 

• Reliable transactions. 

• Compatible with existing and emerging open standards. 

Building electronic services places great demands on both the developers and the chosen 
implementation technology. Such services are inherently distributed. They involve the participation of 
numerous organisations, and are highly dynamic. 

Given that the number of participating organisations may scale arbitrarily, the ability of participants to 
function concurrently with others is critical. The implementation technology must not restrict the 
participant’s choice of computing environment, since the business community uses a wide variety of 
computing environments. 

Systems based on software agents are inherently distributed and since agents communicate 
asynchronously via standard message forms, such systems scale well while allowing freedom to select 
computing environments that suit the participant's needs. Agents may be developed with a variety of 
programming environments from heavyweight languages such as C++, through scripting languages 
such as Perl and special-purpose environments like ASP. 

We used the middleware to re-implement our commercial real time credit card payment service, 
SecurEpayment, in a third of the time it took to build the original system. 
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Architecture 

What we call Agents 

We have based our framework on the interaction between software agents cooperating to provide a 
service. Since there are a number of variants on the term "software agent", we have adopted a fairly 
loose definition to allow software developers the greatest degree of flexibility during the design and 
development process. 

An ’agent’ is minimally defined as an entity that performs operations on behalf of another entity (either 
software or human) with some degree of independence. More specifically, anything that is capable of 
acting without direct supervision, that is capable of interacting with others, and that can respond to 
changes in its environment is considered to be an agent within our framework. It is worth noting that 
this approach allows humans to be included in an agent-based system model, thus allowing extremely 
flexible system designs. 

Framework 

Layers 


Agent 


Agent 

Message 

Message 

Transport 

Transport 

Network 

Network 

^ W 


Figure 1: Framework Layers 


We have defined a software model as a guide for developers building infrastructure software to support 
services designed to operate within the framework. This model is a conventional layered architecture 
comprising four layers: agent, message, transport and network. 

The agent layer is analogous to the more commonly termed application layer, while the message and 
transport layers comprise the support software for the framework itself and are responsible for 
managing the delivery of messages. 

The network layer is responsible for the transmission of data between agents, whether the agents exist 
on the same machine or otherwise, and will generally map to the networking services provided by the 
chosen implementation environment. This abstract model is illustrated in Figure 1. 

The issue of message delivery is not directly addressed within the framework as we see specifying such 
within the framework itself as being a limiting factor to its adoption. The basic requirements for 
messaging are defined by the framework in order to relieve some of the implementation burden and to 
allow compliant implementations to more easily interoperate. 
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The framework requires only a half-duplex communication link between agents to allow conversations 
to take place. Each message carries enough information with it to allow the receiving agent to establish 
the agent that sent the message, the agent to whom it is addressed, and to which message it is 
responding to, should it in fact be a response. 

Messages are delivered to agents as discrete records so a constant communication stream is not 
required, and the order of a sequence of messages need not be preserved when the messages are 
delivered. 

A compliant implementation is free to use any technology, such as web servers, email, or any of the 
common middle-ware technologies such as CORBA or DCOM, that is capable of meeting these basic 
messaging properties. 

How Do Agents Communicate? 



Source Agent Destination Agent 

Figure 2: Communication between agents 

Communication between pairs of agents in the Igloo Framework is modelled on the way humans 
communicate. 

Agents communicate by exchanging messages. Rather than simply exchanging single messages, agents 
use messages in ongoing conversations with other agents. 

Message based conversations enable agents to operate in an autonomous, goal-directed, fashion. Each 
conversation is a task-oriented sequence of messages shared by two agents, with individual messages 
conveying an agent's desired action or intent at a particular point in a conversation. Conversations 
enable agents to integrate basic commerce processes into their operation, such as negotiations or 
auctions with other agents. Such conversations are driven by the higher-level strategies that embody an 
agent's behaviour that were determined by the group developing the agent. 

A normal business conversation between a CD Retail Store and a Distributor negotiating the ordering 
of the CD’s could flow something like this: 


Retail Store: 

Could I order 10 CD’s to be delivered next Wednesday? 

Distributor: 

I only have 5 CD’s in stock. Do you want 5 delivered next Wednesday, and a further 

5 delivered Wednesday week? 

Retail Store: 

Could you deliver 10 CD’s Wednesday week? 

Distributor: 

OK. Your invoice number is XXXX. 
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Message Formats 

Every message has an intent, or action, which describes to the receiving agent why the message was 
sent. A message might be to inform the recipient of some fact, to request an action to be carried out, or 
to ask a query about some fact. Actions also allow the sending agent to indicate that a message it 
received was not understood (either because it was in an unknown format or simply unexpected), or to 
refuse an earlier request. By providing this information about message intent outside of the content of 
the message, agents have some flexibility in dealing with messages that cannot be, or do not need to be, 
understood. 

The message content is deliberately left undefined by the framework, making it the responsibility of the 
agents participating in an exchange of messages to define appropriate content for the message being 
sent. Rather than attempting to define an all-encompassing set of content types, we allow developers to 
define their own content types as they see fit. Leaving message content undefined also allows 
developers to use existing technologies, such as XML, HTML, or even email. Initially the contents of 
messages may be simple marked up forms, such as XML documents, which are a direct representation 
of the information exchanged in an existing business process. As agents become more sophisticated, 
content languages will be developed to allow them to take part in increasingly more complex 
negotiations. 

By not defining a limited common set of content types, we hope to avoid the problem of second 
guessing possible future applications and technologies. Message content types are expected to evolve 
over time as the framework matures, with the most pragmatic and commonly used content types 
eventually dominating. While these pragmatic content types and languages may not be the best 
technically possible, they will be adequate for the job, will offer the flexibility and functionality desired 
by system developers and will also have the highly desirable attribute of being widely adopted. 
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Conversation Protocols 

When two software agents engage in a dialogue, the exchange of messages between them otten falls 
into a number of regular patterns. The agents are playing roles in a performance they have participated 
in previously, possibly in different roles, but with an identical pattern of messages. 

While it is possible for agents to determine the next message to send at each step of a conversation, this 
places a greater demand on the agent's capabilities, requiring more computing resources to be devoted 
to each deployed agent. A pragmatic approach to the problem, and one that reduces the requirements 
placed upon agents is to define a number of protocols. 

A protocol defines two or more roles, each of which can be played by one or more agents, and the 
number of agents that can play each role. At each step of a conversation the protocol specifies the 
messages sent by the various agents and the actions used in each message. The protocol may also place 
a number of restrictions on the content types used for each message. Protocols are in effect templates 
for dialogues between two or more agents. 

Request-Response Protocol 


The request-response protocol describes a simple client-server relationship between two agents. One 
agent requests a second agent to perform an operation on its behalf. Once the second agent completes 
the operation, it responds with the result. 

Two roles are defined by the protocol: a Client agent initiating the request, and a Server agent that 
responds. The Client sends an initial request message to the Server. The Server will then respond with 
either an inform message containing the result of the operation, or one of a number of possible error 
responses. 
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Figure 3: Request-Response Protocol 


Igloo, the eSec Electronic Commerce Framework 






AUUG2K - Enterprise Security, Enterprise Linux 


Retail Store: 

Could I have a list of the Top 10 CD’s 

Distributor: 

Top 10 CD’s 

1 Pearl Jam - Binaural 

2 Britney Spears - Oops...I Did It Again 

3 Bardot - Bardot 

4 Vanessa Amorosi - The Power 

5 Savage Garden - Affirmation 

6 Destiny's Child - The Writing's On The Wall 

7 Macy Gray - On How Life Is 

8 Killing Heidi - Reflector 

9 Moby - Play 

10 Soundtrack - Mission Impossible II 


The Retail Store agent requests a list of the Top 10 CD’s from the Distributor Agent. The Distributor 
Agent processes the request and returns a list of the Top 10 CD’s. 

Three-Phase Request Protocol 

The three-phase request protocol describes a client-server relationship between two agents. It represents 
a more advanced version of the request-response protocol. 

Two roles are defined by the protocol: a Client agent initiating the request, and a Server agent that 
responds. The Client sends an initial request message to the Server. The Server will then respond with 
one of three messages. 


Client 

■ ■ 


Server 

■ i 



agree Indicating that it has accepted the operation, 

refuse Indicating that it has refused the operation, 

not-understood Indicating that it did not understand the request. 

One agent sends a request to perform an operation on its behalf to a second agent. The second agent 
responds with a message indicating whether it accepted or refused the request. If it accepted the 
request, it will respond with the result of the operation. 
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Where the request-response protocol consists of only two messages, an initial request followed by a 
response, the three-phase request includes an additional step, enabling the second agent to indicate its 
acceptance before it performs the operation. 


Retail Store: 

Could I order 10 CD’s to be delivered next Wednesday? 

Distributor: 

I only have 5 CD’s in stock. Do you want 5 delivered next Wednesday, and a further 

5 delivered Wednesday week? 

Retail Store: 

Could you deliver 10 CD’s Wednesday week? 

Distributor: 

OK. Your invoice number is XXXX. 


In an agent-based system, the Retail Store agent would send a message to a Distributor agent requesting 
the CD’s to restock its store. It also specifies that it would prefer them to be delivered next Wednesday. 

Receiving the request from the Retail Store agent, the Distributor agent checks its own stocks for the 
CD’s requested. Finding that it only has five copies in its warehouses, and so it cannot fulfil the 
request, the Distributor determines from the Label agent how long it will take to receive the CD’s. It 
responds to the Retail Store agent, indicating what it can offer. 

The Retail Store agent determines that it does not require any of the CD’s urgently, as it still has some 
in stock, and asks the Distributor agent to place the entire order on back-order. 

Having placed the request from the Retail Store agent on back-order, the Distributor agent responds 
with an invoice number. 

Later, after the operation has been completed, the Server will respond with either an inform or failure 
message to indicate the result of the operation. 

By separating the acceptance of the operation out as a separate step, we are enabling the second agent 
to indicate acceptance of the operation separately from the result of the operation. Three-phase request 
also enables agents to request operations that are to take place sometime in the future. 

Two-Phase Commit 

The two-phase commit interaction describes a client-server relationship between two agents. It enables 
one agent (the transaction manager) to coordinate the operation of two or more servers ensuring that 
they remain in a consistent state if one or more of the servers fail. 

The two-phase commit protocol defines two roles: a Manager agent that initiates the request and 
manages the overall transaction, and a Server agent responsible for providing a service used by the 
transaction. The protocol is broken down into two steps: 

1. The Manager requests the Server to prepare for a transaction. The Server should prepare for 
the transaction, allocating any resources required for the operation. 

2. The Manager instructs the Server to complete the transaction. As all resources required for the 
transaction should have been reserved during preparation, the Server should be able to 
complete the transaction without error or requiring previously allocated resources. 
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Figure 5: Two Phase Commit Protocol 
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Retail Store: 

Could I place an order for 10 CD’s? 

Distributor 

Yes, we have 10 CD’s in stock. Please confirm your request. 

Retail Store: 

I confirm that I would like 10 CD’s shipped. 

Distributor: 

OK. Your invoice number is XXXX. 


The Retail Store agent sends a message to the Distributor Agent asking if they could place an order for 
10 CD’s. 

The Distributor checks that there are 10 CD’s in stock and places a hold on those CD’s. If the 
Distributor does not have enough CD’s it sends back a failure message to the Retail Store. If the 
Distributor does have enough CD’s in stock, it informs the Retail Store agent that there are 10 CD’s in 
stock that will be held awaiting confirmation of the order. 

The Retail Store agent then confirms the request to the Distributor agent to send the CD’s. Once the 
confirmation is received, the Distributor must send the CDs. The Distributor arranges for the CD’s to 
be sent and then returns an Invoice number to the Retail Store. 

By implementing the request as a transaction, the Distributor can ensure that all requests are either 
completed or failed, and not left in an uncertain partially-filled state. 
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The Java Reference Toolkit 

The Java Reference Toolkit is the formal reference software for the Igloo Framework and is included as 
a part of the framework distribution. 

The basic four-layer model of the framework has been extended through some of the language specific 
features provided by Java into the model shown in Figure 2. The Java toolkit provides software support 
for the message and transport layers, and assumes the presence of a TCP/IP networking environment. 


Agent 


Message 

transform 


Transport 
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Network 


Figure 6: Java Reference Toolkit Layers 

The four primary layers of the abstract model are present in the Java toolkit and retain the same 
responsibilities. The message layer has been extended to allow for optional transform modules, while 
the transport layer allows the encapsulation of network interfaces into dynamically loadable modules. 

Transform modules provide additional flexibility for the agent developer by allowing automatic 
transformation of message content from one form to another, and back to the original form again, as 
part of the delivery process. Possible uses for transform modules include tasks such as compression and 
encryption of messages. 

The message layer also has support for typical agent conversations, referred to as the ‘Protocol Layer’ 
even though it doesn’t actually form an independent layer in the implementation. The current agent 
interactions supported by this layer' are Request/Response and Two-Phase Commit. 

The transport modules each encapsulate the semantics of a specific protocol and are loaded on demand 
by the transport layer software when messages are to be transmitted. The use of individual modules for 
transport protocols allows agents the freedom to select the most suitable protocol for a particular 
conversation. 

Depending upon the destination of a message, an agent may choose to transmit a message via HTTP, 
email, direct TCP connection over a socket, or any other protocol for which a transport module has 
been developed. It is in fact possible for a message to be sent with one protocol and for a response 
message to be received with a completely different protocol. 
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Hello World Client Agent example 

import java.io.IOException; 

import esec.lib.PropertiesPlus; 
import esec.message.*; 
import esec.protocol.*; 

public class HelloWorldClient 

{ 

public static final void main(String[] args) 

{ 

try 

{ 

// simple client, no configuration 
ProtocolLayer.configure(PropertiesPlus.EMPTY); 

} 

catch (IOException e) 

{ 

e.printStackTrace(); 

System.exit(1); 

} 

// set up for conversation 

RequestResponse rr = new RequestResponse(); 

// create an empty request 
Message request = new Message(); 

// set recipient 

request.setTo("tcp://localhost:8010"); 

// send request and wait for response 
Message response = rr.requestResponse(request); 

if (response == null) 

{ 

System.out.printIn("Request failed: " + 
rr.getErrorMessage()); 

} 

else 

{ 

System.out.printIn(response.getAVBody().get("text- 

message" ) ) ; 

} 

rr.done(); 

} 

} 
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Hello World Server Agent example 


import java.io.IOException; 

import esec.lib.PropertiesPlus; 
import esec.transport.TransportServer; 
import esec.message.*; 
import esec.protocol.*; 

public class HelloWorldServer implements RequestHandler 

{ 

public boolean handleRequest(Message request, Message response) 

{ 

// respond with message containing 'hello world’ 
response.getAVBody().set("text-message", "Hello World!"); 

// INFORM == success 

response.setAct(RequestResponse.INFORM); 

return true; // close connection after responding 

} 

public static void main(String[] args) 

{ 

try 

{ 

// simple agent does not require properties 
ProtocolLayer.configure(PropertiesPlus.EMPTY); 


} 


// any request will generate the same response 
RequestResponse.setDefaultRequestHandler( 
new HelloWorldServer() 

) ; 

// configure tcp port 

PropertiesPlus tcpProps = new PropertiesPlus(); 
tcpProps.setProperty("port", "8010"); 

TransportServer.configure("tcp", tcpProps); 

// start the server 
ProtocolLayer.start(); 

} 

catch (Exception e) 

{ 

e.printStackTrace(); 

ProtocolLayer.stop(); 

System.exit(1); 

} 


Igloo, the eSec Electronic Commerce Framework 


27 



AUUG2K - Enterprise Security, Enterprise Linux 


Future 

XML can currently be sent as plain text and must be handled specially by each agent implementation. It 
is intended that the agent tool kit will provide hooks for parsing and constructing valid XML encoded 
messages. 

Directory services for agent discovery will most likely be provided via LDAP support. This will either 
be directly in the toolkit, or via a directory service gateway which can be readily accessed via the 
toolkit. 

The current message format is 'heavily inspired' by RFC822/MIME. In future the messages will 
become fully compliant with these standards. Note that the current message format can be delivered via 
SMTP and such delivery is independent of the actual message format. 

Another future development is 'agent primitives’ that will act as higher level building blocks for 
commerce systems such as brokers, auctions payment systems etc. 
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Abstract 

Secure database access is a common problem in customer-to- 
business eCommerce, particularly where there is a business 
requirement for the data to be created and updated by customers. 

This paper explores some solutions to the general problem, 
including suggestions regarding network architecture and 
application design. 


1 Introduction 

There are several models foreCommerce, including business-to-business (B2B), business-to- 
consumer (B2C) and consumer-to-business (C2B). For the purposes of this paper, B2C 
transactions are defined as those where a business sells an item to a consumer and C2B 
transactions are defined as those where a business obtains information from a consumer that is 
integral to the service being sold by the business to the consumer. 

Examples of eCommerce supporting C2B transactions include: banking via the web where the 
customer may transfer funds; placing advertisements via the web where the customer may 
modify or remove their advertisement; and requesting courier services via the web where the 
client may request tracking information and modify the destination. In these examples, 
information is shared between the business and the client. The business must ensure that the 
client is correctly identified and cannot repudiate their action. 

As the desire for services to be web-enabled grows, so will the need for transactions where 
clients supply and later change information required by a service provider. 

2 Assumptions and Definitions 

This paper assumes that the reader is familiar with the concepts of confidentiality, 
authentication, integrity, non-repudiation, firewalls, filtering routers, DMZ, encryption, 
encrypted sessions, digital certificates, digital signatures and certification authorities. 1 

Confidentiality, authentication, non-repudiation and integrity may be provided by public key 
infrastructure (PKI) which comprises encrypted sessions, digital certificates and digital 
signatures. It is assumed that all transactions are done over an encrypted session. 

It is assumed that businesses have a unique digital certificate and digital signature issued by a 
trusted certification authority. Thus, B2B transactions involve an authenticated vendor and an 
authenticated buyer. Transactions cannot be repudiated by either party. 
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Consumers may or may not have a digital certificate. Thus, a business may be authenticated to 
a customer which has not been authenticated to the business. This infers that the customer is 
assured of the identity of the business and that the business cannot repudiate the transaction, but 
the business has no such guarantees. 

In B2C transactions, this is considered to be an acceptable risk because the consumer needs to 
be authenticated only for payment purposes and where payment is made via credit card, all risk 
is transferred to the bank which issued the credit card. 

However, in C2B transactions, this risk is not acceptable. The business requires that the 
consumer may legitimately modify or destroy information previously supplied by themselves, 
so all consumer transactions must be authenticated and non-repudiable. 

3 Existing Solutions for Similar Business Systems 

Problems associated with WWW applications have often been solved in some measure for a 
different medium because the underlying business system is the same. 

Authentication and non-repudiation issues have been addressed, apparently to the satisfaction of 
many utilities and financial institutions, for sharing information between a customer and service 
provider via the telephone. Transactions, such as updating personal information including 
address and contact number by telephone, typically rely on the caller knowing the customer’s 
date of birth and street address or providing a password or PIN. These items of personal 
information can be obtained easily and many people choose easily-guessed passwords. 

Existing solutions for similar business systems do not provide strong authentication or sufficient 
non-repudiation. 

4 Solution Requirements 

A good solution must meet business requirements and conform to good security practices. The 
business requirements of C2B transactions are: 

• preserve the integrity of database contents; 

• provide customers with the ability to create, view, modify and delete information 
regarding services to be supplied; 

• ensure that information can be modified only by the customer that created it; 

• guarantee non-repudiation; and 

• provide quick response times. 

Good security practices for eCommerce installations include: 

• an appropriately configured firewall, comprising filtering routers and a proxy gateway; 

• change management for all systems involved, including the database server, the web 
server, the firewall and filtering routers; 

• provision of a test environment where patches, upgrades and configuration changes can 
be tested before being applied to the production environment; 

• provision of a staging server and automated update mechanism; 

• installation of a documented minimal operating system on all hosts involved; 

• application of all appropriate security and recommended patches, software upgrades and 
firmware upgrades in a timely manner; 

• the use of tools including Tripwire and intrusion detection systems; and 

• applications and scripts which test and restrict all user input appropriately. 
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5 Network Architecture 

5.1 Baseline 

It is assumed that the eCommerce infrastructure includes a database server, an internal network, 
a firewall, a www server and an untrusted network: 



5.2 Recommended Solutions 

One solution is to have a master database server on the trusted network and a copy or subset of 
the database on the DMZ. Queries are made over the copy, giving a better response time than 
queries run over the master database. 

The WWW server provides a user interface for updating and removing information from the 
database, see Section 7 Application Design on page 35 for notes on input constraint verification. 
These modifications are stored as requests on the WWW server, which the master database 
server polls many times a second. When a request is found, the modification is applied to the 
database master and propagated to the database copy. 

There is a fairly obvious race condition in this solution in that a user may request an update, 
then request a query before the update has completed. To combat this, it is recommended that 
the business sets customer expectation for update turnaround appropriately. 

The expected update turnaround can be described as the sum of the update poll latency, data 
transfer times to the master and slave database servers and the database storage times. For 
updates polled 100 times a second over a 10Mbps network and 200 database stores per second, 
update turnaround = latency + 2 (data transfer + data store) - 10 +2 (0.025 + 5) = 20.05ms 
Compare this with a query request over a public network where transport speeds average about 
200Kbps. A 5KB query would take 25ms. 



firewall, — ,,,cwau ' 

routers routers 


Precautions should be taken to ensure that the DMZ database server can be restored or rebuilt 
quickly and the copy of the database should include a minimal set of confidential information to 
mitigate loss in the event of a successful intrusion. 
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The preferred solution for businesses which need to protect their infrastructure as well as data is 
to have a dual-firewall architecture, utilising the processes described above. The benefit of 
placing the database master server behind a separate firewall are: the firewall rulesets can be 
optimised for update traffic; the trusted network is protected if the database master server is 
compromised; and the dual-firewall architecture provides additional defence-in-depth. 



5.3 Other Solutions 


One option is to store the database entirely on the DMZ. This is not a good solution because the 
master database server is not protected from the WWW server. The entire database contents 
may be corrupted in the case of a successful security intrusion. 
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Another option is to allow query connections inbound to a database server on the trusted 
network. This exposes the trusted network to the risk of attack from the database server in the 
event of a successful intrusion on that host. 
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6 Authentication 

Digital certificates are the best authentication mechanism at this time. However, most 
individuals do not have one due to price considerations. 

A business could set up its own certification authority to provide all clients with certificates 
cheaply. The risk associated with this is that all authentication is subverted if the host providing 
certification is compromised. 

Other authentication mechanisms include reusable passwords, multiple keys e.g., date of birth 
and address and machine-generated passwords supplied to the customer at record creation time. 

It is assumed that each record is assigned a unique ID so that customers can identify the correct 
record to be modified at a later date. 

7 Application Design 

The web application may include user authentication via a reusable password sent over an 
encrypted session, e.g. https, or via a proprietary server/client handshake which requires client 
software to be installed on the customer’s machine. A unique identifier and machine-generated 
password may be created for each record, thus enabling the application to authenticate the 
creator of the record for future modification. 

The cgi scripts or binaries which accept user input for the creation or modification of data must 
provide strong input verification. To achieve this, a set of constraints must be defined for each 
input field. That is, the maximum number of characters and the and acceptable character-set 
should be defined for each input field. Only acceptable characters should be stored and the 
input string should be truncated if it exceeds the maximum number of characters. 

8 Summary 

Customers must be strongly authenticated to ensure non-repudiable C2B transactions. This is 
best achieved via digital certificates. 

Constraints should be defined for all user input required for creation or modification of data and 
the data should be verified against these constraints in all instances. 

A number of appropriate network architectures have been presented. 

Standard security precautions are also recommended for eCommerce installations. Install a 
minimal operating system on all hosts involved, verify all changes in a test environment, 
employ a staging system for upgrades, patches and configuration changes, utilise change 
management, keep all software up-to-date, apply all appropriate security and recommended 
patches, use file system integrity and intrusion detection systems, ensure that all scripts test and 
restrict user input appropriately. Protect the database via filtering routers and a firewall which 
are also covered by change management. 
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10.1 WWW References 


Certain references above are available on the world-wide web as follows: 


Reference 

URL 

[Smi93b] 

http://www.auscert.org.au/Information/Auscert info/Papers/Selected Aspects 

of Computer Security in Open Systems.html 

[TriOO] 

http://www.tripwire.com/literature/221datasheet.pdf 


For more information about these concepts, please see Sections 2.1 and 2.6 of Selected Aspects of Computer 
Security in Open Systems [Smi93b], Public Key Cryptosystems, Certificates, and Certification Authorities 
[Smi93a] and CCITT Recommendation X.509 [IS092]. 

2 Tripwire tests file system integrity. For more information, see [TriOO]. 
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e-smith: A Server/Internet Gateway for the Non-wizard 

Gordon Rowell 
Charlie Brady 

ABSTRACT 

The e-smith server and gateway is a co-ordinated suite of free software components 
which converts a standard PC into an easy to install and operate network appliance. 
Added to a trimmed-down RedHat Linux installation is an automated installation pro¬ 
cess, a console configuration program, and a web based management interface. The 
internals are modular and easily extensible. The end result is inexpensive, efficient, re¬ 
liable and versatile. 


1. Introduction 

An increasing proportion of the workforce are information workers, and are equipped with desktop comput¬ 
ers. Businesses, community organisations and indeed homes, have a universal need to network these com¬ 
puters to share resources, including the now indispensable Internet connection. 

Many technologies are used to provide this sharing - they all have their drawbacks. Peer networking is too 
unreliable and rapidly turns into a maintenance nightmare as the number of computers on the local network 
increases. Proprietary server technology is expensive. Network appliances are inflexible and quickly 
become obsolete. *nix server solutions are deemed too complicated to manage without years of technical 
training. 

The e-smith server and gateway seeks to solve these problems by assembling the world’s best Open Source 
software and combining it with a coherent and easy to use management system. Other products also do this, 
but few or none are as open and extensible as e-smith. All e-smith software is released under the GPL. 

2. The e-smith server in a nutshell 

The e-smith server is a Linux server providing some or all of the following services: 

— SMTP mail transport 

— POP and IMAP mailbox access 

— SMB (Windows) file and printer sharing 

— Appletalk over IP (Mac) file and printer sharing 

— DHCP 

— DNS 

— Apache Web Server 

— LDAP (Directory service) 

— ftp server 

— IP masqueraded Internet connection 

The server operates either as a LAN based single-homed server, or as a dual-homed server and Internet 
gateway. There is support for either PPP or ethernet connection to the Internet (in gateway mode). DHCP 
can be used for IP address allocation on the external interface (for example, with cable modem services), 
and there is support for a variety of dynamic DNS services. 

3. The e-smith design philosophy 

"Make it as simple as possible , but no simpler" 

- Albert Einstein 

The prime design goal of the e-smith server is to minimise the number of configuration parameters without 
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reducing the set of services that are provided. In other words, to make it as simple to configure as possible. 

Other important design goals are to make it very easy to install and to be very secure. An important sec¬ 
ondary goal is to make the addition of new services both possible and easy. 

4. The e-sniith installation process 

Installation of an e-smith server is achieved by booting from a floppy disk with the installation CDROM in 
place [1], accepting the licensing conditions by typing "accept", then giving the go-ahead for the installa¬ 
tion process by typing "proceed". The installation is then fully automated; after five to twenty minutes 
(depending on system speed) the installation media can be removed, and the server rebooted into a "factory 
standard" configuration. 1 

The automated installation is achieved by use of the RedHat "kickstart" feature. The "kickstart" specifica¬ 
tion determines a set of installation parameters (CDROM as source medium, default time zone, no mouse, 
etc), the disk partition layout, set of packages (in RPM format) to install, and finally a "finish script" to con¬ 
figure the system ready for initial reboot. 

5. The console GUI 

The console GUI is a text based menu system with the following options: 

1. Check status of this server and gateway 

2. Configure this server and gateway 

3. Review configuration 

4. Test Internet access 

5. Reboot or shut down this server and gateway 

6. View support and licensing information 

Of particular interest is the "Configure" option. This provides the entry point into a configuration "wizard", 
through which the system administrator configures the server’s basic parameters: server only or 
server/gateway mode, local network addresses, domain name, system name, internet connection type, inter¬ 
net connection parameters, etc. At the conclusion of the configuration data entry, the new data is used to 
configure all services, ready for the server to be rebooted into its operating mode. 

6. The web interface 

All day to day management of the system is performed via a password protected web interface accessible 
only on the local network (complete remote administration is possible if SSH is installed). The standard 
manager web interface controls a variety of system configuration settings, including such operational 
parameters as user and group lists, email aliases, printers and shared file areas: 

Security 
Password 
Remote access 
Local networks 
Configuration 
Date and time 
Time server 
Workgroup 
Directory 
Printers 
Email retrieval 
Other email settings 


1. The forthcoming 4.0 release will support installation from a bootable CDROM. 
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Collaboration 
Email aliases 
User accounts 
Groups 

Information bays 
Virtual domains 
Miscellaneous 
Create starter web site 
Review configuration 
Backup or restore 
Reboot or shutdown 
Support and licensing 

7. Under the covers 

So how does all of this work? The system configuration is stored in a set of databases which make up the 
configuration database. All operational files are generated from templated configuration files, which are 
expanded with values from the configuration database whenever changes are made. The web interface and 
console configuration utility make changes to the configuration database and then call a set of action scripts 
which apply the changes, expand the templates and restart daemons as required. Nearly all of the code is 
perl. 

7.1 The configuration databases 

The system configuration is stored in a set of databases. These are currently a set of flat files, but changing 
to another storage format would be quite easy, and will probably eventually need to be done, to provide bet¬ 
ter concurrency, performance and scalability. 

The configuration databases are tied to a perl hash by a custom module. Within all perl code the configura¬ 
tion and account data are accessed as perl associative array variables: 

my %conf; 

tie %conf, 'esmith::config'; 
my %accounts; 

tie %accounts, 'esmith::config', '/home/e-smith/accounts'; 

$conf {'SambaServerName'} = $sambaServerName; 

my $ip = $conf {'ExternallP'}; 
while (($key,$value) = each %accounts) 

{ 

} 


7.2 The web management interface 

The web-based manager is a set of CGI scripts stored below the /etc/e-smith/web directory. This directory 
is organised to support multiple web based user interfaces (called "panels' 1 ). The base system includes two 
standard panels: the e-smith-manager panel (which requires the administrator password and which provides 
system administration features) and the e-smith password panel (which can be accessed without password, 
and which allows users to change their own passwords). 

The e-smith-manager panel supports many standard functions, and the navigation and index frames are 
dynamically generated, so that additional functions may be easily added. 
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7.3 The event handling system 

After collecting and validating user input, the web interface scripts update the configuration database, then 
execute a program ”/sbin/e-smith/signal-event” to activate the new configuration information. The ’’signal- 
event” program is passed an event name as an argument, sometimes with an additional argument, for exam¬ 
ple a username or a group name. 

An event scripts directory structure is used, which was inspired by the System V init system. The action 
scripts are all contained in the /etc/e-smith/events/actions directory. These action scripts are, however, never 
directly executed. Each named event has a directory associated with it which is populated with a set of sym¬ 
bolic links, each to an approprate action script. The ’’signal-event" program simply executes each of these 
symlinks, using the same parameter list with which it was called. The symlinks are executed in lexico¬ 
graphic order, so the names of the links are chosen so that the desired execution order is achieved. 

For example: 

/sbin/e-smith/signal-event user-create charlieb 

would create the user ’’charlieb”. The information required to properly create the charlieb account is found 
in the configuration database, and each action script from the /etc/e-smith/events/user-create directory is 
called, with the event name, ’’user-create”, and the account database key, "charlieb”, as parameters. 

Here is the event directory for "user-create”: 

[/etc/e-smith/events]$ showlinks user-create/* 

user-create/S15user-create-unix -> ../actions/user-create-unix 
user-create/S251dap-update -> ../actions/ldap-update 

user-create/S35create-user-homedir -> ../actions/create-user-homedir 
user-create/S45create-user-welcome -> ../actions/create-user-welcome 
user-create/S55create-user-notify -> ../actions/create-user-notify 
user-create/S60email-update-user -> ../actions/email-update-user 
user-create/S90create-fullname-alias -> ../actions/modify-fullname-alias 

The "user-create’’ event occurs in seven phases; the creation of the user account, the update of LDAP infor¬ 
mation for the user, creation of the user home directory, sending of a welcoming email notice, etc. 

This structure facilitates the creation of new events, and the easy attachment of new actions to existing 
events. In the example given, each action is part of a different RPM - the email and ldap packages, for 
instance, could be removed or modified without affecting the other actions. 

7.4 The configuration file system 

A critically important part of the e-smith system is the system of templated configuration files. 

The templated configuration system allows operation of the e-smith server without any knowledge of the 
underlying technology. The user makes policy choices, and sets some parameter values, and a consistent set 
of configuration files are instantiated which implement the requested policies, using the given parameter 
values. This effectively distils the configuration expertise of the e-smith system designers for later applica¬ 
tion by less technically sophisticated users. But, as we have found in our own work, it also allows sophisti¬ 
cated users to configure systems both quickly and consistently. 

This configuration discipline is achieved without compromising flexibility. The template files are available 
on the system and can be edited to make absolutely any custom change. Moreover, the current design 
explicitly facilitates packaged additions to configuration files, and local custom changes. So let’s look at the 
details. 

The perl language, and particularly the module Text:.Template, provides a very powerful template expan¬ 
sion mechanism with very little effort. Text::Template allows interpolation of variable text into literal text. 
The variable text is produced by evaluating perl code fragments contained within the template inside pairs 
of braces (’{}’). 
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In the e-smith system, the variable space of the perl code fragments is initialised with the contents of the 
configuration database. The generated text will therefore be a function of the contents of the configuration 
database. 

The storage format of the configuration templates is the key to the facilitation of modular and custom 
changes to the templates. 

The templates are stored in a directory tree below /etc/e-smith/templates which shadows the directory tree 
of the root file system. For instance, the template for /etc/smb.conf is found at /etc/e- 
smith/templates/etc/smb.conf. At that location there will either be a simple file, or a directory containing a 
set of files. The template to be expanded comprises either the contents of the file, or the concatenated con¬ 
tents of the set of files from the directory. 

To allow customisation, and to clearly isolate customisations from package installed templates, a second 
template tree is rooted at /etc/e-smith/templates-custom/, and templates found in that tree are preferred to 
the standard templates. 

For example, let’s consider in more detail the template for /etc/smb.conf: 

[/etc/e-smith/templates]$ Is -1 etc/smb.conf/* 

etc/smb.conf/template-begin 

etc/smb.conf/lOglobals 

etc/smb.conf/2Onetlogon 

etc/smb.conf/5 Ohomes 

etc/smb.conf/60netlogonshare 

etc/smb.conf/6 Oprimary 

etc/smb.conf/9 0ibays 

etc/smb.conf/template-end 

[/etc/e-smith/templates]$ cat etc/smb.conf/lOglobals 


workgroup = { $SambaWorkgroup } 

interfaces = 127.0.0.1 { "$LocalIP/$LocalNetmask" } 

And here is the action script code to instantiate that template: 
my %conf; 

tie %conf / 'esmith::config'; 

# - 

# Configure Samba server. 

# - 

esmith::util:rprocessTemplate (conf, "/etc/smb.conf"); 

8. Adding a service 

The above features combine to allow new services to be added without change to any existing code or data. 
A new service, say, a caching news server, can be added simply by installing two RPMS - the unmodified 
news cache software itself, and an e-smith configuration package for that software, comprising a web inter¬ 
face module, configuration templates, and action scripts to configure and activate the service. These all 
communicate through the configuration database, possible using newly defined configuration items. 

9. A call for help - East Timor 

The first public release of the e-smith server was in April 1999. The second release was in August 1999, 
and by this time the authors were actively involved in development, as we were assisting Community Aid 
Abroad to set up Linux network servers in all of their offices. 
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We had already decided that e-smith was a great choice for the small Community Aid Abroad offices, but 
had yet to complete the local modifications we considered were necessary, when the September East Timor 
crisis called for urgent action. We had a few days notice to package the software for distribution - and nei¬ 
ther of us were to be on-site to assist with installation, configuration and debugging! 

On the eve of the installation, Charlie discovered a simple modification to the installation boot floppy which 
allowed the server parameters to be preconfigured. George Grisancich, the Community Aid Abroad IT 
manager who did the server installation in Darwin, had only to insert a CDROM and boot floppy, type 
’’accept" and "proceed" and then leave the server to install and configure itself. Gordon had a mail waiting 
for postmaster@darwin.caa.org.au and got it back later that day with no intervention on our part. 

When the Darwin installation was declared a success, we were told that Dili would be next. George organ¬ 
ised for the hardware to be delivered to Darwin and we installed that server on the Darwin network, again 
with minimal intervention. We then connected using SSH via the Darwin server, and added the extras we 
would need for the Dili link: a custom mail transport system which stripped non-essential mail headers, 
then compressed messages, before delivering them to the CAA National office in Melbourne, via UUCP 
over satellite phone. 

We were able to test everything except the satellite phone link in Darwin. When the server was shipped to 
Dili and connected to the satellite phone the email went though at first dial attempt!! 

10. Future directions 

We see a number of directions in which the e-smith server technology will develop, both from the vendor, 
and via community contribution: 

— The server currently is rather opaque - it is necessary to log in as root on a console to view log files or 
system performance parameters. Performance monitoring interfaces will be developed. 

— The server code will undergo a thorough security audit, and changes will be made to the security 
architecture. The current design is known to be insecure if third party CGI scripts are installed. 

— Performance and scalability will be improved. 

— The LDAP schema will be extended. 

— Tape or network backup and restore will be supported. 

— VPN solutions will be provided - e.g., SSL web server support, gateway to gateway VPN, roaming 
access. 

— The code base will be internationalised, and localisations in a number of languages developed, as a 
stimulus to contribued to localisation. 

— Databases and web scripting languages such as PHP will be integrated. 

— Additional services will be developed - fax, calendar/scheduling, collaboration, mailing lists, mailing 
list archives, webmail - the possibilities are literally endless. 


The authors are major contributors to the development of e-smith and have installed e-smith systems in a 
number of not-for-profit organisations, providing an inexpensive, reliable and easy to manage service. They 
will both be travelling to Canada shortly after AUUG2K to join the e-smith development team. 

The main e-smith web site is http://www.e-smith.net and e-smith software by the authors can be found at 
http://e-smith.gormand.com.au 

Gordon Rowell <Gordon.Rowell@gormand.com.au> 

Charlie Brady <Charlie.Brady@nlc.net.au> 
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Abstract 

Building on experience with a general-purpose notifica¬ 
tion service, we describe the design and implementa¬ 
tion of a second-generation content-based messaging 
system. Elvin4 includes a novel security framework, 
internationalisation, a powerful subscription language, 
and a modular pluggable protocol stack. 

We discuss its evolution from previous versions , differ¬ 
ences from related work, and describe the transition in 
underlying ideology from notification service to con¬ 
tent-based routing and the effect this has had upon the 
design. 

1. Introduction 

Mechanisms such as RPC, message queues and multi¬ 
cast can all be termed directed communication models: 
the destination of a message is specified at the time it is 
sent. The destination can be made more transparent 
through the use of a name (and name server) or a group 
identifier (in the case of multicast), but it remains the 
sender’s responsibility to direct the message. 

This requirement causes difficulty in situations where 
the sender does not know the destination, when it is 
constantly changing, or when the number of recipients 
varies. A common solution is to introduce an explicit 
agent or proxy at a known address to which the sender 
always delivers the message. This agent then handles 
the message distribution. 

Elvin3 [SA97] implemented an alternative mechanism. 
It provided a means of content-based addressing , send¬ 
ing simple structured messages and allowing receivers 
to use a subscription language to select messages of 
interest, with a mostly-transparent router process taking 
the place of the explicit third party. 

Over three years of deployment, this model of network 
programming has been proven simple, flexible, and per- 
formant over a range of application areas both within 
our own organisation and by external clients. However, 
deployment has also highlighted the need for additional 


features and exposed some flaws in the protocol design 
that we have tried to overcome in developing the next 
version, Elvin4. 

In the next section we describe Elvin3 with particular 
focus on its limitations. We introduce some related 
work, examine some of the applications in which it has 
been used, and identify strengths and weaknesses lead¬ 
ing to the design goals for Elvin4. 

We then discuss the Elvin4 protocol and its implemen¬ 
tation, before reflecting upon the transition in our ideol¬ 
ogy between a notification service and a content-based 
routing infrastructure and finally discuss our plans for 
future work in this area. 

2. Elvin3 

Elvin3 was an attempt to demonstrate that content- 
based addressing was a viable model for distributed 
inter-process communication. It was deliberately sim¬ 
ple, particularly at the programming interface and did 
not attempt to provide a full set of features, omitting for 
example any security mechanism. 

2.1. Protocol 

The Elvin3 protocol used a long-lived connection 
between client programs and the routing daemon. It 
was based on TCP/IP, with a custom marshalling layer 
supporting six packet types (see figure 1). All packets 
were sent asynchronously; the connection was assumed 
to be reliable and there were no acknowledgement or 
response packets. 
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The absence of acknowledgements required that all 
packets were semantically verified within the client 
library. This was especially onerous for subscription 
requests: the client library had to parse the supplied 
expression and this parser added substantially to the 
size and complexity of the library code. Frustrating dif¬ 
ferences between regular expression libraries in differ¬ 
ent operating systems and languages also caused prob¬ 
lems with the server rejecting expressions accepted by 
the client. 

The marshalling layer itself was a source of problems: 
it had several outright bugs and caused significant loss 
of floating point precision. Whilst easily fixed, compat¬ 
ibility with deployed clients quickly became more 
important than correctness. These problems were the 
consequence of the failure to use an existing, standard, 
marshalling protocol. 

2.2. Quenching 

An important feature of the Elvin3 design was the 
introduction of quenching. Quenching provides a 
mechanism for clients to determine whether subscribers 
are interested in their messages, and enables clients to 
reduce (and thus, quench) the message traffic by send¬ 
ing information matching only current subscriptions. 

What quickly became apparent was that while the idea 
was successful, the implementation did not scale to the 
number of subscribers we were supporting. After being 
enabled (by the control packet, see figure 1), whenever 
a subscription was changed, and at most once every ten 
seconds, a quench packet was sent to all clients with 
quenching enabled. However, the quench packet con¬ 
tained a complete copy of the subscription database, 
and the transmission time of the packets in some cases 
exceeded the minimum period between updates. 

In addition, the subscription database was sent as a 
boolean expression in string format. This proved very 
unwieldy for programmers: it had to be parsed to 
extract the required information, and this made the task 
of writing a well-behaved, high-volume client unrea¬ 
sonably difficult. 

2.3. Implementation 

The protocol implementation consisted of a C client 
library and server, both written for a modern Unix envi¬ 
ronment. Solaris, Linux, OSF/1, HP/UX, AIX and 
Ultrix were all used at various stages, but the require¬ 
ment for a POSIX threads implementation made 
deployment problematic for *BSD and some older 
machines. Similarly, the use of thread cancellation pre¬ 
vented a port to Windows NT. 


Client libraries were provided for a variety of lan¬ 
guages. Some were native implementations of the pro¬ 
tocol, and in particular, the Java mapping was pure Java 
from its first incarnation. The Python mapping started 
out using the C library under a thin wrapper, but was 
later rewritten entirely in Python. 

Supported languages included Java, Python, TCL, 
VisualWorks Smalltalk, and Emacs LISP with external 
work on Allegro LISP and Lambda MOO also under¬ 
taken. Notable for its absence here is PERU. We had 
several attempts at writing a direct PERL mapping, but 
the use of threaded callbacks in the C API made it diffi¬ 
cult, and PERL users were constrained to using com¬ 
mand-line utilities to send and receive messages. 

2.4. Administration 

Deployment of Elvin3 was complicated by the conflict¬ 
ing requirements of our funding bodies and general 
users. We supplied the system as a bundle of all com¬ 
ponents, together with their dependencies, which made 
for easier installation on a raw machine, but tended to 
cause conflicts with the average well-stocked /usr/local. 
The integration was pervasive, and unbundling the 
Elvin components proved too difficult. 

Once installed, few problems occurred for typical sites 
with the exception of clients locating the server. While 
Unix sites tend to share filesystems, making a central 
configuration file relatively simple, Windows machines 
are typically installed with private copies of applica¬ 
tions. We opted to rely on the DNS, using a well- 
known host name to locate the default Elvin server 
machine for a network. This proved quite difficult for 
many sites, where the addition of a CNAME record for 
a protocol the DNS administrators had never heard of 
took some negotiation. 

Finally, the server’s reporting and control interface was 
constrained by the lack of security for messaging. We 
relied instead on the security mechanisms of the host 
machine, using signals and log files for server manage¬ 
ment. This made remote administration increasingly 
difficult as we deployed more servers across our net¬ 
work. Additionally, the available statistics were only 
minimally useful and made no provision for service 
metering, capacity management or QoS monitoring. 

2.5. Summary 

Despite all these issues, it is important to state that 
Elvin3 had a number of significant strengths. 

Most importantly, the fundamental concept of content- 
based addressing has proven successful. Elvin3 has 
demonstrated that the concept is useful in a range of 
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application areas, is feasible to implement with ade¬ 
quate performance, and is quickly comprehensible to 
programmers of varying skill levels. 

The programming APIs are simple and together with 
the wide range of supported languages, this has min¬ 
imised the programmer’s learning curve. As a result 
people were likely to use Elvin when they wanted to 
write something quickly, or integrate existing compo¬ 
nents. 

Alongside this local development and observation, we 
have observed a wider trend towards undirected com¬ 
munications in particular, and messaging in general. 
The commercial adoption of publish/subscribe and its 
benefits for application architecture have been matched 
by research interest in notification services and novel 
addressing mechanisms. 

3. Related Work 

Elvin provides a means of transmitting unacknowl¬ 
edged messages between distributed processes. It is 
unlike RPC (and remote method invocation), stream 
protocols like TCP and multicast protocols ranging 
from raw IP multicast through the various mechanisms 
for reliable group communication. Due to its use of 
content-based addressing and the requirement that mes¬ 
sages are only delivered to connected clients, it is also 
distinct from messages queuing and similar store and 
forward systems. 

3.1. Notification Services 

Elvin3 shares most features with what are often called 
notification services. This section introduces a selec¬ 
tion of similar services, and describes them briefly. We 
pay particular attention to the features that differ from 
those of Elvin. 

3.1.1. Keryx Notification Service 

Keryx [Low97] is a Java notification service, designed 
by HP Labs in Bristol. It uses an elegant on-the-wire 
transfer syntax called Self-describing Data Representa¬ 
tion (SDR) to describe both messages and subscrip¬ 
tions. Subscriptions are SDR expressions conforming 
to a restricted grammar, the Default Filtering Language 
(DFL). DFL predicates are comprised of type tests of 
SDR elements, arithmetic operations, list operations on 
compound values and boolean combinators on sub¬ 
expressions. 

Keryx messages are structured as name-value pairs in 
SDR. The values range from simple data types to lists 
and nested name-value maps. The underlying transport 


protocol is TCP-based and quite simple. It is also pos¬ 
sible to send SDR messages over alternative transports, 
although the available implementation does not include 
this feature. 

3.1.2. CORBA Notification Service 

After specifying a channel-based event service 
[OMG95], the OMG developed an extended specifica¬ 
tion, the CORBA Notification Service [CNS99] to pro¬ 
vide filtered channels. A CNS message object can be 
one of three types: a CORBA Any, a statically typed 
CORBA object, or a Structured Event comprised of a 
type, some filterable name-value pairs, and a non- 
filterable payload. Subscribers connect to a channel, 
and may register a filter for message object’s types. 
Any value, and the name-value pairs. 

CNS provides a means to federate channels into a rout¬ 
ing network for events and to specify various qualities 
of service on a channel, such as persistence or reliabil¬ 
ity. All communications within CNS are based on 
CORBA method invocations. 

3.1.3. Gryphon 

Developed at IBM’s TJ Watson Labs, Gryphon 
[ASSAC99, BCMNSS99, BKSSST99] is an ambitious 
system that maps a subscription database to a network 
of underlying brokers that distribute the messages. The 
subscription evaluation includes security and adminis¬ 
trative filtering attributes and is being extended to pro¬ 
vide additional services, such as storage and forward¬ 
ing, within the broker network. 

3.1.4. XmlBlaster 

An open source development, XmlBlaster [XmlBlaster] 
uses an XML syntax to describe messages consisting of 
a filterable header, an opaque body, and a system con¬ 
trol section. Filters, in the form of XPath [XPath] 
expressions are evaluated over the header to select mes¬ 
sages for delivery to subscribers. 

XmlBlaster also includes a message queuing system 
within the same framework, with the message control 
section indicating whether it is directed to a set of desti¬ 
nations, or published for access by subscription. 

3.2. Other Systems 

A range of other systems, both notification-style and 
more general message-oriented middleware (MOM), 
have been developed. They range from large-scale 
enterprise application platforms [TSS95, Tal-SS, IBM- 
MQ], to desktop buses for inter-application co- 
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ordination |OPSS93, Sun93]. While not considered 
here, we have investigated a wide range of these during 
the development process. 

4. Applications 

While content-based addressing enables a new class of 
applications, the rate of change in programmer mindset 
from the more traditional IPC/RPC/messaging has 
meant that many early programs do not fully utilise the 
power of the mechanism. As an example, our most 
widely used application, Tickertape, is basically a chan¬ 
nel-based application, built using a content-based 
infrastructure. However, as programmers’ experience 
has matured, we have begun to see the unique abilities 
of content-based addressing being utilised. 

This section introduces a selection of Elvin3 applica¬ 
tions, and describes their use of the system. 

4.1. Tickertape 

Tickertape [FPSK98, PFKS98] is a lightweight, tai- 
lorable desktop tool that provides an interface to tran¬ 
sient information via a single-line horizontally scrolling 
message window (like a stockmarket ticker). It is used 
as a filter tool for public information sources (Usenet 
news, CNN and ABC stories, netcomics etc), as a chat 
tool, and is localised with various information sources 
(CVS, RCS, web hits, downloads, email notification). 

It uses Elvin messages of the form: 


to lunch 

message 12:30 Staff Club? 

from sara 


Figure 2: An example message, 
which are displayed like this in the scroller: 



Figure 3: Tickertape Scroller. 

The Tickertape client has the ability to subscribe to 
groups by name and filter the messages for a group by 
the content of the other fields. 

Tickertape became extremely popular, in part because 
Elvin3’s simple APIs made it easy to make local infor¬ 
mation easily available. This was not without some 
unfortunate side-effects (tickerspam!) and led to a need 
for us to address access control for communications 


between ourselves (private groups) as well as for per¬ 
sonal instrumentation (such as email subject notifica¬ 
tion via Tickertape). 

4.2. Awareness biffs 

One of the early uses of Elvin was small awareness 
applications of a generic class that, for obvious reasons, 
we called biffs. Originally used by xpilot gamers to 
inform each other who was playing, it was quickly re¬ 
purposed as coffeebiff and has proven quite popular as a 
social awareness tool. 

The interface provides a coffee icon on which to click, 
a one line scrolling list of current coffee drinkers, and a 
count of how many people are drinking coffee. Besides 
keeping our sociologists busy, it has been most interest¬ 
ing from a distributed systems algorithms perspective, 
leading to investigations into state maintenance and 
sharing without central repositories that have applica¬ 
tions beyond Coffeebiff’s lighthearted domain. 

4.3. System monitoring 

Filewatcher is a daemon that generates Elvin messages 
when a monitored filesystem is changed. Without oper¬ 
ating system support (now available on Linux and Win¬ 
dows), monitoring a complete filesystem is too CPU 
and I/O intensive, but Elvin’s quench facility allows the 
filewatcher to monitor only those files for which a sub¬ 
scription exists. 

While initially quite successful, the filewatcher became 
less useful due to the inadequacies of Elvin3 quench 
mechanism. This experience contributed to the com¬ 
plete restructure of quenching in Elvin4. 

Similar utilities were written for web and ftp server 
logs. Whenever a new record was written to the log, it 
was parsed, and the information emitted as an Elvin 
event. After our experience with the filewatcher, the 
log watchers did not attempt to quench the traffic. 

EDDIE [TM98, Mil99] is a large, general-purpose, sys¬ 
tems monitoring tool originally written by staff at con- 
nect.com.au for internal use. It’s now freely available 1 , 
and optionally uses Elvin to distribute notifications of 
abnormal conditions to systems staff via Tickertape or a 
GSM SMS gateway. 


1 http://www.codefx.com.au/cddic/ 
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4.4. Scoop 

Scoop is a small program that dumps an entire Usenet 
news feed into Elvin for subsequent subscription by 
clients. Whilst it is an obvious case for quenching, we 
have been using it to drive some volume testing, emit¬ 
ting up to 4Gb of data in some 120K messages per day. 
The service has proven extremely useful and has made 
reading Usenet usage more focused and productive. It 
is intended to make this a quenching client for Elvin4. 

4.5. Breeze 

Breeze [Breeze] is an event-driven workflow engine 
written at DSTC. It uses the Elvin3 Java API and mes¬ 
sage transport to build an asynchronous RPC mecha¬ 
nism. This allows the workflow engine to control het¬ 
erogeneous components, and to transparently support 
visualisation of the workflow state. 

A number of additional applications have been devel¬ 
oped using the Elvin infrastructure, both within DSTC 
and by other organisations. Those mentioned here are 
typical of their character, or represent (in the case of 
Tickertape) the focal point for an ’ecology’ of smaller 
applications sending or receiving messages used by 
other contributors. 

5. Analysis 

Each of the messaging systems has similar functional¬ 
ity to Elvin3, and yet all differ substantially in the 
details of their implementation. In comparing these 
systems, we focus on two core properties: the address¬ 
ing model, and the support for message persistence, and 
related issues of reliability. 

5.1. Addressability 

The basic focus of our work has been the means of 
addressing messages. Usually, a message’s destination 
is specified entirely by the sender. This applies to 
everything from physical mail and telephone calls, to 
message queuing systems. 

Alternatively, the sender can share the role of message 
selection by specifying a partial address using a chan¬ 
nel identifier or by providing addressable metadata 
independent of the message body, and allowing the 
receiver to further refine the traffic stream. This is sim¬ 
ilar to Usenet News, for example, where the sender 
directs messages to a group, where the receiver then 
performs further filtering on the basis of sender and 
subject, before reading the content. 

Finally, the sender can specify no destination at all, 
leaving the selection of messages completely up to the 


receiver. Analogies for this mechanism are somewhat 
stretched, but perhaps it could be seen as being like a 
search engine, where every web page in existence 
(when the crawler last crawled) is addressable by its 
content. 

One of the benefits of using partially-directed or undi¬ 
rected communication is the reduction in coupling 
between the communicating parties [ASBBLK99]. As 
distributed systems become larger, and more subject to 
piecemeal extension and evolution, closely coupled 
interfaces become difficult to maintain. Undirected 
communication reverses the nature of addressing from 
producers pushing messages to consumers pulling 
them. Of course, while it is possible for a receiver to 
select messages solely on the basis of their originating 
address, this reversal would make little difference to 
system design. Receivers would now have to locate 
senders, and the arity and identity of communicating 
parties is still fixed; the only thing that would change is 
the direction of traffic flow. 

Channel, subject and content-based addressing schemes 
attempt to reduce the coupling between the parties in a 
communication by removing the specification of the 
involved parties’ identity. Channel-based services 
require the producer to nominate a channel from which 
subscribers may receive their messages. This is the 
least flexible scheme; the direction of messages to a 
channel restricts their visibility, effectively partitioning 
the address space, and coupling the parties via the 
channel identifier. 

Subject-based services split messages into an address¬ 
able subject, and an opaque body. Subscribers may 
select messages using filter expressions on the subject. 
This is a popular scheme, implemented by well- 
established commercial products such as TIBCO Ren¬ 
dezvous [TSS951. Such systems sit part-way between 
channel-based and content-based schemes in their flexi¬ 
bility: their address space is global (all messages are 
equally visible), but the addressability is limited to a 
single field, and the burden on the sender is to max¬ 
imise the exposure of information possibly useful for 
selecting a message within a single, often textual, field. 

XmlBlaster uses an extended form of subject-based 
addressing, splitting messages into an addressable 
header, and an opaque message body. The entire mes¬ 
sage is an XML document, but the application of con¬ 
sumer’s XPath subscription expressions is restricted to 
the header sub-section. The rationale behind this dis¬ 
tinction would appear to be that the body section is 
delivered to the application, but the header section is 
metadata added to the body purely for the purposes of 
routing. This distinction is artificial, and constraining: 
a change in the basis for routing decisions could require 
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that additional information from the body be made 
available in the header, and because of the separation, 
this will require that the source program(s) be modified. 

CNS has a similar mechanism: some components of the 
message objects cannot be addressed by the consumer’s 
filters, while Elvin3, Keryx and Gryphon provide com¬ 
plete addressability of the message’s content. Such con¬ 
tent-based schemes have neither restrictions on the visi¬ 
bility of messages nor restrictions on what elements of 
a message can be used for selection. 

The application that has benefited most from this 
absence of coupling is Tickertape. The initial message 
format was defined by the subscription of the receiver 
(the scroller). Over time, features were added to the 
scroller, for example, to accept MIME attachments and 
to replace scrolling messages with new contents, but 
the addition of these fields to the message definition did 
not mean that the existing senders or receivers stopped 
working: the original version of Tickertape still func¬ 
tions (without the new features) after over 3 years of 
enhancement to the basic protocol. 

Additionally, the use of content-based addressing has 
meant that despite the basic Tickertape GUI model 
being channel-based, some users have opted to search 
the message text for keywords of interest to them, by 
registering a suitable subscription. Another collects 
and publishes a list of known groups using the quench 
facility to obtain a copy of the registered subscriptions, 
and parsing for those matching the Tickertape defini¬ 
tion. 

Similarly, the use of Elvin in Breeze enables multiple, 
independent components to receive the messages from 
the workflow engine indicating changes in state, and to 
perform various functions, from initiating the next step 
in the workflow, to driving a visualisation of its 
progress. 

5.2. Persistence and Reliability 

In discussing notification services, Ramdunny [RDR98] 
defines a pure notification service as one where: "the 
server is entirely separate from the datastore". This 
separation of function between message routing and 
message persistence becomes our second criterion for 
comparison. Message routers, such as Keryx or Elvin3, 
deliver messages to connected clients with a matching 
subscription. Subscriptions are not retained while the 
client is disconnected, nor do matching messages accu¬ 
mulate waiting for the client’s session to be re¬ 
established. 

CNS, XmlBlaster and Gryphon all provide some level 
of support for queuing messages for a disconnected 


client. This suggests the notion of reliability: the server 
guarantees that a message received will be stored until 
the client can collect it, and this in turn leads to quali¬ 
ties of service for messages. Using QoS, messages 
may prioritise themselves in the queue, expire if not 
retrieved by a given time, be replaced by later messages 
containing updated information, etc. 

Naturally, this additional functionality makes the imple¬ 
mentation of the service considerably more complex. 
The necessity for persistent storage and its associated 
overhead make for a sharp distinction in the observed 
performance of those services that function strictly as 
routers, and those that provide a ’reliable’ service. 

This does not mean that content-based addressing can¬ 
not be combined with a mechanism for persistence, but 
argues for separation of the functionality. 

Some form of persistence has been a common request 
from users of Elvin. Coffeebiff is an exception to this 
rule, where the current state is the important property, 
and historical information irrelevant. However, none of 
the applications have yet required a type of persistence 
that needed additional features in the routing daemon 
for its implementation. 

Reliability is a more complex issue, clouded by the fact 
that quantitative measures appear not to be as satisfac¬ 
tory as a more general qualitative statement. The major 
issue is that of federated message routers, where the 
sender successfully delivers a message to an initial 
router from which it is then forwarded, potentially 
across huge numbers of linked systems. 

In this scenario, ’reliable’ delivery would mean taking a 
global snapshot of the combined subscription databases 
of all routers at a single point in time, and ensuring that 
those clients whose subscriptions matched the message 
acknowledged its delivery. This is obviously not scal¬ 
able. 

Content-based addressing encourages decoupling 
between senders and receivers, and this proves to be the 
antithesis of guaranteed reliability. Retaining Elvin3’s 
best effort client semantics, together with a more rigor¬ 
ous approach for inter-router forwarding seem to be the 
best approach for further development. 

6. Elvin4 

Even before the issues that arose during deployment 
and application development were known, we had iden¬ 
tified a range of features that could not be included in 
Elvin3 for various reasons, and from this arose the 
prospect of a next major revision. Throughout its 
deployment, feature requests were usually answered 
with "wait for Elvin4". Of course, there came a time 
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when we actually had to write it, and in doing so, select 
the features to be included in the new version. 

In this section we discuss the design goals for Elvin4, 
with reference to the identified problems with Elvin3, 
ideas from other messaging systems and our experience 
with applications using a content-based addressing sys¬ 
tem as discussed in the previous section. 

6.1. Design Goals 

The basic design goals for Elvin4 were 

• better protocol design and implementation 

• some additional message data types 

• some changes to the subscription language 

• internationalisation 

• a security mechanism 

• usable and efficient quenching 

• automatic server discovery 

• scalability to more clients, and beyond a single server 

Each of these goals is discussed in the context of the 
implementation below. 

6.2. Implementation 

After some analysis and much discussion, we decided 
that rewriting the system from the ground up was 
required. We retained the use of C as the implementa¬ 
tion language, after serious consideration of Java. It 
was felt that C was both portable to more platforms and 
sufficiently better suited for the task of writing net¬ 
working software to overcome the lack of garbage col¬ 
lection and standard library support available in Java. 

Development has proceeded on both Unix and 
Windows NT roughly in parallel, and despite some dif¬ 
ficulties in mapping concepts to the different mecha¬ 
nisms provides by the two systems, the code is rela¬ 
tively clean. 

6.2.1. Protocols 

The Elvin3 protocol was TCP-based with a custom 
string-oriented marshalling. It was not modular; 
replacement of the protocol meant rewriting large sec¬ 
tions of the server. For EIvin4, we decided that a mod¬ 
ular approach was worthwhile, despite the loss of per¬ 
formance inherent in such an approach. This would 
also allow us to support multiple protocols satisfying 
different requirements. 

Consequently, Elvin4 supports an abstract protocol 
stack comprised of three layers: marshalling, security 
and transport (see figure 4). Each layer of this stack 
may have multiple concrete implementations available 


Server 





Marshalling 

XDR, XML 


Security SSL Krb5, None 



Transport 

TCP, HTTP 

OS 


Figure 4: The Elvin4 abstract protocol stack. 


within a single server. Interfaces between the layers are 
well-defined, and in particular, the security layer forms 
a complete encapsulation of the transport interface, to 
support protocol implementations, such as OpenSSL, 
where this is required. 

We currently provide concrete marshalling layers for 
XDR [RFC 1832], XML [XML98] and are working 
towards a serial protocol for embedded and handheld 
devices. Security modules for SSL [SSL96] and 
Kerberos 5 [NT94] are being developed, together with 
a ’none’ module which simply passes messages to the 
transport module, for which we have TCP, UDP and 
HTTP implementations. 

This modularity has also led to the need to describe the 
protocol stack used for a given server endpoint. We 
have adopted a URL format, encapsulating the stack 
description, location data for the endpoint (i.e, a host 
and port), and other server properties (see figure 5). 

Message and quench delivery packets remain asyn¬ 
chronous and together with message emission, unac¬ 
knowledged. The remainder of the protocol is now 
acknowledged, supporting connection management, 
registration of subscriptions and quench requests, and 
configuration of security keys. A final abstract packet 
type supports extremely simple producers, using uncon¬ 
nected message emission, for example over a UDP 
transport. 


elvin:/tcp,none,xdr/mach»ne.domain.com:12345;property=value;property2 

Advertised server properties 
used by clients for selection 


The address of the endpoint The format depends on the 
protocol stack elements 

The protocol stack offered via this endpoint (transport, security, marshalling) 


Figure 5: Elvin4 URL example. 
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The adoption of XDR as the default marshalling stan¬ 
dard, and the change to using RPC-style interactions 
for the ’administrative’ protocol functions has over¬ 
come the protocol-level problems from Elvin3. The 
modular protocol architecture also means that problems 
identified in future can be deployed while retaining 
compatibility with existing clients. 

6.2.2. Datatypes and Subscriptions 

As for Elvin3, Elvin4 messages consist of a set of 
named values. In addition to Elvin3’s supported data 
types of integer, string, and floating point, Elvin4 pro¬ 
vides 64 bit integers and an opaque type for binary data 
such as images or compiled code. 

In addition to a simple equality test, Elvin3 provided a 
POSIX Extended Regular Expression (ERE) matching 
operator for strings. After observing its use, we have 
introduced some simpler string matching routines: 
begins-with , ends-with and contains , with optional case 
insensitivity. While retaining the ERE operator, these 
tests cover a large proportion of our observed usage, 
and are both easier for the programmer and can be 
effectively optimised within the router. 

A second extension was to support simple arithmetic 
within the subscription evaluation engine. Some Elvin3 
applications were required to match a large number of 
messages, and perform secondary filtering which could 
be quite simply done by the server. Integer and floating 
point arithmetic and integer bitwise operations are now 
available within the subscription language. 

6.2.3. Internationalisation 

A large amount of effort has been made to support 
internationalisation with Elvin4. The message string 
data type is now UTF-8 [Unicode] encoded so that 
international characters can be represented. This has 
required additional subscription operators to support 
normalisation and comparison strengths, however the 
language remains very similar to Elvin3 for simple 
cases. 

Additionally, the abstract protocol error packet contains 
a message code and list of arguments supporting the 
use of message catalogs in the client. The error packet 
also includes the message string in the server’s native 
language, to aid debugging and provide a common-case 
default for simpler clients. 

6.2.4. Security 

One of the major challenges for Elvin4 was to create a 
mechanism for authorising message delivery. The 


difficulty is that this authorisation forms a coupling 
between the message producer and its consumers, and 
it is the absence of coupling that is a key benefit of con¬ 
tent-based addressing. 

Elvin4 controls access to messages through the use of 
one-way keys. Producers may supply a set of raw keys 
that are transformed by a one-way hash function within 
the server, prior to matching. To receive a message 
containing keys, a subscriber must supply one or more 
keys matching the transformed keys from the producer. 

While delivered messages are annotated to indicate that 
they matched a subscription key, subscribers may also 
elect to receive only secured messages. Both producer 
and subscriber key sets may be associated with the 
server connection to avoid resending the key set with 
each operation. 

The distribution of the keys to both producers and con¬ 
sumers is not managed within Elvin. Applications and 
individual sites may utilise a variety of mechanisms, 
ranging through shared filesystems, directory services 
or even smartcards, to share keys. 

Given that the possession of a key enables access to 
protected message traffic, the transmission of keys 
between the clients and server must be encrypted to 
ensure security. As described above, the Elvin4 proto¬ 
col stack supports the use of a security layer to perform 
this function. 

A more complete description of the Elvin4 security 
mechanism, its limitations and their possible solutions, 
is described in a forthcoming paper. 

6.2.5. Quenching 

Following our experience with quenching in Elvin3, it 
was obvious that this was an area requiring significant 
work. There were two basic problems with the existing 
mechanism 

• each quench update contained the entire 
subscription database 

• quench information was supplied as a raw 
string 

To address the first issue, Elvin4 allows clients to spec¬ 
ify a filter over the subscription database, constraining 
quench updates to those subscriptions capable of 
matching messages produced by the client. The filter is 
expressed as a list of message attribute names that must 
be present in the subscription. Clients may register, 
modify or remove quench filters in a similar way to 
subscriptions. 
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Updates to the subscription database cause quench 
packets to be sent to clients with matching filters. 
These updates describe additions, modifications and 
removals from the server’s subscription database. The 
use of quench filters, and the notification of relevant 
changes in the subscription base, rather than delivering 
a complete copy, both serve to vastly reduce the proces¬ 
sor and network overhead of using quenching. 

In addition, the change to sending updated fragments of 
the subscription database has included the use of an 
abstract syntax tree format rather than a raw string. 
This saves the client from requiring a parser, and allows 
the server to simply forward the relevant portions of its 
internal state to the quenching client. 

The implementation of quenching requires seven addi¬ 
tional packets, and a substantial increase in the com¬ 
plexity of marshalling to support the abstract syntax 
tree components. To support lightweight clients such 
as handheld or embedded devices quenching is an 
optional feature in Elvin4. 

As an extension of the mechanism, it is possible to 
automate the quenching process by gathering the 
attribute names from emitted messages and building a 
suitable quench filter. The client library can then dis¬ 
card messages for which there are known to be no sub¬ 
scribers without additional code in the application. 
This auto-quench facility can be enabled via a single 
library call. 

6.2.6. Automatic Server discovery 

Configuring clients to connect to an appropriate server 
was one of the major administrative issues with Elvin3 
deployment. In an attempt to overcome this, Elvin4 
clients may use a multicast query to locate a server. 
Multiple servers can be provisioned within a multicast 
domain, with different protocol stacks and configured 
scopes. Unconfigured clients will connect to a compat¬ 
ible server configured with the default scope, while 
more sophisticated clients may specify particular 
servers by using non-default scope names or a direct 
URL. Scopes are not especially useful with a single 
server, but are intended to provide a mechanism for 
redundancy and automating failover between clustered 
servers in a future release. 

6.2.7. Scalability 

The Elvin3 server demonstrated that content-based 
addressing could be fast. However, its architecture was 
limited to supporting a few thousand clients, and fewer 
on platforms without a lightweight thread model. 


Elvin4 remains an extremely fast content-based router, 
and has removed the dependency on lightweight 
threads, however there are application domains for 
which supporting tens or a hundred of thousand clients 
is required. In these cases, it is necessary to farm out 
subscription evaluation to multiple servers in a tightly- 
coupled local area federation. Elvin4 contains a mech¬ 
anism for handover of client connections to facilitate 
load sharing in such an environment 

The remaining research challenge is to address scalabil¬ 
ity to wide-area networks, and to provide an internet- 
scaled Elvin service. 

7. Ideology 

Elvin started as a lightweight notification service but is 
now viewed by its developers as a content-based rout¬ 
ing service. This change in philosophy for what is 
essentially the same service is two-fold. 

Firstly, our work on scalability has changed our view of 
Elvin from being a server to that of a service. Connec¬ 
tions are now made not to a particular server but to the 
service itself. Within the DSTC environment, multiple 
sites are seamlessly connected together using wide-area 
links so that we can use the service at multiple loca¬ 
tions as though it were a single entity. 

Secondly, this transition in philosophy is due to think¬ 
ing about the addressing model of notification services. 
Traditional communication paradigms rely on source 
routed messaging - that is that the sender of the infor¬ 
mation specifies where the information is to be deliv¬ 
ered to. Source routing is the traditional model for RPC 
and even in multicast communication the sender is 
specifying the address. The inherent limitations on 
source addressed messaging have generally been miti¬ 
gated by the use of indirection (e.g name->address res¬ 
olution, or trader services). 

By contrast, Elvin messages are routed not by the 
sender (which simply emits an unaddressed, structured 
message) but instead by the recipient. It is the con¬ 
sumer’s subscription expression that defines and 
changes the routing of the messages and hence Elvin is 
best viewed as a Content Based Routing (CBR) service. 

8. Future Work 

We have described the interaction protocol between the 
E!vin4 router daemon and the client libraries. Cur¬ 
rently, this is effectively limited to a LAN environment 
by both performance and protocol design. Our next step 
is to work towards wide-area scalability continuing 
towards an Internet-scaled content-based routing infras¬ 
tructure. 


Content Based Routing with Elvin4 


63 



AUUG2K - Enterprise Security, Enterprise Linux 


Additionally, we have many more concrete implemen¬ 
tations of the abstract protocol to implement and if 
Elvin is to be truly ubiquitous we need to extend the 
number of language bindings beyond C, Java, Perl, 
Python, and Emacs LISP. 

Much work remains to make server configuration more 
possible and easier. Additionally it is necessary to be 
easily able to manage and control local area federations 
and the wide-area links between administrative bound¬ 
aries. Another area requiring attention is key manage¬ 
ment if the Elvin security model is going to be readily 
accessible to users. 

Finally, there are many more interesting clients to be 
written, particularly now that usable quenching is avail¬ 
able. 

Availability 

Elvin is available in both source and binary form under a not- 
for-commercial-use license. Full documentation, FAQs, addi¬ 
tional software and the download itself can be found on the 
Elvin homepage 

http://clvin.dstc.edu.au/ 
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CURRENT LEGAL ISSUES 
Kimberley Heitman 

Scope of paper: Current legal issues relating to networks: 

1. Hacking 

2. Encryption 

3. Censorship 

4. Harassment and DOS Attacks 

5. Defamation 

6. Copyright 

7. Privacy 

8. Illegal content 

Introduction 

With the increasing numbers of people, businesses and Government agencies using computer networks, 
legal issues are being highlighted as differing expectations are imposed upon the Internet, ranging from 
attempts to control access to content to attempts to regulate the activities of people using the global 
networks. 

As Internet usage rises, competing interests are seeking to apply the law in traditional ways or to expand 
legal controls over international links or analogous legal contexts. In some areas, traditional legal remedies 
are being used effectively - in others the global nature of the Internet challenges cultural insularity and the 
ability of local law-enforcers to influence conduct outside their jurisdiction. 

I. Hacking 

In this context I particularly refer to malicious or fraudulent behaviour rather than the milder European 
definition that includes probes of computer networks by curious youngsters. The Dutch Internet Service 
Provider, xs4all.nl, had a policy for many years of offering 6 months' free access to anyone who could 
obtain unauthorised access privileges, as a relatively benign form of security testing. However, in its 
undoubtedly criminal form, the term "hacking" is better described in terms of harmful access to private 
information leading to abuse of that knowledge or malicious damage to that data. 

Early convictions for "hacking" offences were notoriously uncertain - authorities veered between confusion 
as to whether a crime had been commited or to the other extreme of gross over-reaction. For example, a few 
early convictions for unauthorised access to a network were prosecuted as theft of minute amounts of 
electricity or calculated by reference to cost of provision of the access improperly obtained. Typically the 
young offenders were placed on good-behaviour bonds for conduct involving a few dollars' theft yet liable 
to penalties of many years in prison. A few unlucky hackers were "made examples of, and sentenced to 
long gaol terms, many others were not prosecuted as authorities and system operators were uncertain as to 
whether a law could, or should, be applied. 

On the face of it, the Australian Federal Crimes Act had always provided for penalties for abuse of a 
telecommunications service, whether data or voice, using a definition of "offensive" conduct that had been 
expanded by a series of leading cases to catch any conceivable misuse. However, prosecution by the Federal 
Police implied a level of seriousness and a commitment of police resources that was not often manifested. 
During the nineties, States and Territories enacted laws independently which effectively defined two 
offences - unauthorised access to a network, and damage to the data on a network. Both carried gaol terms, 
with the latter offence typically being considered more serious as a degree of criminal and wilful damage 
was seen as analogous to property damage, and often involved expense to rectify. 

Now that these laws have been in place for some time, the necessary evidence to support a conviction and 
the degree of seriousness necessary to prompt a police investigation, a prosecution and a court case have 
become settled issues. It is an established principle that if considerable financial loss was occasioned, a 
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prosecution will issue and the perpetrator will be sentenced in accordance with tariffs applicable to any 
other type of fraud, and more severely than other forms of damage. Hacking others' credit card information 
is certainly gaolable, as is obtaining commercial Internet access by fraud. On the other hand, obtaining 
access to particular parts of networks or subdirectories is unlikely to be punished in such a severe way. 

As in many legal issues on the Internet, proving the identity of the perpetrator is often an evidential burden 
for prosecutors, since without "owner-onus" on computer systems, the identity of the user of a computer 
may evade proof beyond reasonable doubt. Anonymisers, use of web-based posting mechanisms such as 
Remarq or use of the Internet accounts of other family members add to this difficulty. 

More fundamentally, prosecution of hacking can be impractical if the perpetrator is outside the jurisdiction 
of the investigating authorities. If the offender is in another country, it is a tall order to obtain cross- 
jurisdictional assistance from the police or system operators in that foreign country - let alone to justify 
calling in Interpol. 

"Hacking" of web pages appears to be developing as a malicious prank, similar to graffiti. For example, 
following the passage of the Broadcasting Services Act, the web site of the Australian Broadcasting 
Authority was repeatedly defaced, by exploiting security deficiencies in the web server. For the system 
administrator, the prosecution uncertainties make self-help the best remedy. Adequate security on publicly- 
accessible networks is a minimum standard in an Internet-enabled world, and petty security breaches 
continue to be a means by which security procautions are tested and fine-tuned. 


2. Encryption 

This is receding as a controversial legal issue as the results of the efforts of Phil Zimmermann and his 
program "Pretty Good Privacy". Despite the best efforts of the United States and fellow members of the 
anti-cryptology treaty "The Wassenaar Arrangement", the program was widely distributed through mirror 
sites all over the world. The program's essential feature was the ability to encrypt at any level of protection 
desired by the user, notwithstanding that military intelligence and law enforcement bodies did not want the 
public to have military-grade encryption for personal use. 

From a state-security perspective, it was an unacceptable risk for terrorists, spies and criminals to be able to 
encrypt communications so effectively that it was beyond the ability of Government to resolve to plain text. 
However, suppression of mathmatics was ever a losing strategy, and Phil Zimmermann and his supporters 
made the algorithms and code required for strong encryption availbale to anybody for any purpose. While 
the Governments were concerned and made to look powerless as treaty obligations and aspirations were 
sidestepped, simultaneously the availability of more powerful computers and networked decryption 
programs made weak encryption useless as a security protection for e-commerce. Once the Electronic 
Frontiers Foundation had established that standard bank-grade encryption could be cracked, the pressure 
from the banking industry and online merchants to continuously raise the strength of encryption to exceed 
the capacity of unknown individuals to decrypt became overwhelming. 

Realistically, developers of encryption software for legitimate purposes are now encouraged to use effective 
means to protect the security of transactions and the privacy of transmitted data generally. Within Australia, 
a reasonable use of encryption would not lead to real problems with security agencies even if the literal 
word of the Wassenaar Arrangement is breached. In fact, Australia's toleration of exported cryptography 
products is a commercial advantage in the international marketplace that is acknowledged as Government 
policy. 

EFA can claim a major contribution to the debate with its publication of the Walsh Report, commissioned 
by the Australian Government and completed by a former security chief. The report was originally 
suppressed by the Government, then released in a heavily-edited format. Amusingly, it was discovered by 
EFA that unedited copies had been widely distributed to university libraries, and the suppressed parts were 
those which detailed why strong encryption could not be effectively outlawed. For a former security chief to 
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admit the powerlessness of authorities to control private use of encryption was a telling blow to the 
Government's policy of attempting to do so, and the attempts to prosecute Phil Zimmermann in the United 
States under laws relating to the export of dangerous weapons became ludicrous as his program became 
both ubiquitous and an increasingly legitimate use of technology. Eventually the US Government dropped 
the charges, admitting at last that suppression of encryption software was impossible. 

3. Censorship 

Whether one agrees or disagrees with censorship of Internet content, there are legal issues relating to 
implementation of a censorship regime by any national Government. Early attempts to control offensive 
material on the Internet by State and Territory authorities were hampered by ad-hoc enforcement and 
imperfect analogies with classification schemes relating to offline media. In the early nineties, prosecution 
was based on subjective judgements by local authorities, leading to police action over material ranging from 
rude jokes to pictorial evidence of child sexual abuse. 

Victoria, Western Australia and the Northern Territory opted in the mid-nineties to legislate in relation to 
the obligations of systems adminstrators to control the availability of criminalised content, unfortunately 
begging the question as to the impact on cross-jurisdictional access to content deemed illegal in one State or 
Territory but not specifically criminalised in another. For example, WA and NT permitted networks which 
permitted access to material unsuitable for children whilst Victoria did not. Following a series of Federal 
studies and reports, last year the Federal Government enacted the Broadcasting Services Act that banned 
certain material nationwide and imposed particular requirements on the operators of publically-accessible 
networks. 

In brief, the Australian Broadcasting Authority has the power to ban any static Internet content, newsgroup 
or particular web site. System operators are not required to monitor Internet content for offending material, 
but are obliged to abide by the Internet Industry Code of Practice (http://www.iia.net.au/code.html) which 
includes obligations to inform users and content providers of their legal responsibilities and to provide for 
the use of users one or more of a number of filtering technologies. These range from web proxy software, a 
number of client-side web browser filters or child-oriented services such as Kidz.Net. 

Since the Act came into operation on January 1 this year there have been a handful of Australian sites shut 
down, one notice in relation to a newsgroup and a series of complaints to foreign law enforcement bodies as 
a result of complaints made by the public. The problems of enforcing local standards on the global Internet 
have been well demonstrated, with sites banned in Australia re-emerging as offshore sites within hours and 
at least half of the complaints relating to overseas sites outside the jurisdiction of the ABA. 

Interestingly, while the Government conceded that the large number of sites outside Australia made 
compulsory blacklists impractical, at present the Government appears to be relying on blacklists to enforce 
its ban on Internet gambling. This may be based on the Government's belief that there are only about 1000 
gambling sites worldwide, a figure that is an under-estimate - anecdotally there are twice that many in the 
city of Las Vegas alone. 

System operators may at least be satisfied that prosecutions for transmission of unlawful Internet content is 
likely to be uniform nationwide, and that section 91 of the Act seems to over-ride State and Territory laws 
affecting them in their capacity as operators of infrastructure. However, that any Australian can access any 
site outside Australia notwithstanding the passage of the Act remains a major problem for censorship within 
this country, and it is arguable that the Internet has instead adopted an international standard which 
effectively permits any material permitted in at least one country in the world. International police 
cooperation is limited to investigation of pictorial child abuse sites in relation to content regulation, 
although of course Internet-related terrorist activities and fraud are subject to the same international 
prosecution policies as exist for offences of those types in the offline environment. 

4. Harassment 
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With more people getting online comes all of the frictions of the offline world. As the Internet becomes less 
of a curiousity and more of a basic communications resource, it is to be expected that inter-user friction will 
increase at least proportionately. Because of the relative anonymity of Internet interactions, and that people 
are not necessarily resident within the same locality, there appears to be an added element of risk-taking in 
personal expression that often leads to friction. The term "flaming" reflects the normality of robust debate, 
personal attacks and aggressive email that characterises the Internet forums such as Internet Relay Chat, 
Usenet newsgroups, mailing lists and web-based chatrooms. 

Equally, the ersatz confidentiality of email communication leads to romances and friendships developing 
across national borders and going through the twists and turns of human relationships. Threats and offensive 
behaviour are arguably as upsetting in the online world as in the offline world, perhaps even occasioning an 
added degree of seriousness when coupled with computer-based attacks such as viruses, forgeries and denial 
of service attacks. 

These offences are well-covered by State and Territory laws relating to offensive behaviour of general 
application - a threat is a threat however delivered. Equally, the Federal catch-all of "offensive use of a 
telecommunications service" covers most misuse of computers, with other sections of the Crimes Act 
specifically criminalising particular types of misbehaviour. It is quite routine for restraining orders or 
prosecutions to issue when both the offender and the complainant are within the same jurisdiction, and not 
uncommon for such orders to cross State and Territory boundaries. 

Of course, persuading a foreign Government to act is difficult, rare and sensitive to differing laws of 
evidence, applicable offences and perceived degrees of seriousness. However, the laws of general 
application hold up well since the unlawful act relates to the behaviour of the offender rather than the use of 
the computer per se. It is difficult to conceive of a communication sent across computer networks of a 
threatening or harassing nature that would not equally be an offence if delivered by the postal service or via 
voice telephony. 

While harassment of an individual can be harrowing, a sustained denial-of-service attack such as a ping 
flood, packets-with-payload or distributed DOS attacks have the potential to shut down a site of any size, at 
inconvenience to any number of users. The recent publicity over distributed DOS attacks on the Yahoo site 
(among others) show the community concern over DOS attacks is at a high level, and that anyone identified 
as a perpetrator can expect severe punishment. However, the bumbling attempts to identify the source of the 
"Love Bug" email "virus" also demonstrates that damage can cross national boundaries and that 
prosecutions may flounder under local law enforcement failings. 

5. Defamation 

Australian law generally follows the British tradition of allowing people and companies to recover damages 
for loss of reputation under the laws of libel and slander, without the exceptions relating to freedom of 
expression guaranteed under the US Constitution. 

For a system operator, the traditional laws of defamation result in liability being placed upon the network 
as a secondary publisher of defamatory material once the system operator becomes aware of that material 
and fails to immediately remove it. It is settled law that a post to the Internet or a web page may be 
defamatory, and apart from being able to sue the originator of the offending material, the defamed person 
may also sue all distributors of that material as secondary publishers, subject to those notice requirements. 

A famous online litigant, Laurence Godfrey, has successfully used the laws of defamation and the 
international legal quagmire to obtain monies by suing ISPs for permitting material alleged to be defamatory 
to continue to be available from their network. The Melbourne PCUG, New Zealand Telecom and British 
ISP Demon Internet are among his defendants who have found it cheaper to pay than to defend claims of 
defamation, especially in the context of litigation across jurisdictions. It is relatively cheap to start a 
defamation case compared with the enormous costs of proving the issue in Court - consequently it makes 
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commercial sense to pay a smallish sum of damages to a Plaintiff rather than incur the expense of defending 
a trial, however meritorious the defence may have been. 

Currently an associated legal issue - liability for hyperlinks - has threatened to broaden liability for online 
defamation by including as co-publishers anyone who links to a page or newsgroup article alleged to be 
defamatory. This issue is currently before the Courts in San Francisco at the suit of one Curzon-Brown, a 
teacher who alleged that a web site defamed him. In an attempt to suppress the alleged libel, Curzon-Brown 
sought damages from anyone linking to the offending site. 

Within Australia, it is arguable that the Broadcasting Services Act section 91 actually provides a defence to 
an ISP that innocently transmits material subsequently found to be defamatory - however if notice has been 
given, it is possible that the defence fails. As "deep-linking" is a highly controversial issue in legal circles - 
is it a re-publication or just a reference - this is a moot point that may be argued in each jurisdiction in the 
future. 

Once again, cross-jurisdictional variations on the rights of free speech may lead to uncertainty as to the legal 
position in relation to material posted or hosted in other countries. Is the relevant Court located in the 
country where the material was placed online, the country where the defamed person resides or in any 
country that provides Internet access at the choice of the defamed person. Certainly a defamed person would 
prefer to have the case decided under British law rather than the more liberal attitudes towards robust free 
expression in countries such as the United States or Holland, or might prefer the corrupted legal system in a 
country without the structures we would expect from a system that embodied the rule of law. 

In the meantime this remains a key issue for operators of computer networks, best addressed by having 
rapid response systems in place to deal with complaints about hosted content. 

6. Copyright 

This year’s hot legal issue relating to Internet usage has been copyright violations, especially in relation to 
distributed piracy networks and the Napster program and service.Owners of intellectual property have ever 
been in the forefront of test cases involving compuer networks, and already the Napster case has given rise 
to unusual legal precedents as national copyright laws try to tackle unauthorised distribution of musical 
works. Equally, the litigation which followed the so-called DVD crack demonstrated the vigour with which 
owners of intellectual property are prepared to defend their exclusive rights to licence distribution and 
modification of that I.P. 

Like defamation, copyright violation becomes the problem of the network operator once notice has been 
given - even under the relatively modern approach taken in the proposed amendments to the Federal 
Copyright Act. The duty of a network administrator to prevent notified abuses of copyrights has already 
extended to blocking propagation of newsgroup articles and using technical means to reduce the incidence 
of such abuse. It is arguable that this duty may extend in the future to black-banning alleged offenders, 
blocking Napster traffic or sampling web pages. 

However, this sort of legal action seems to be doomed to failure if anonymised distribution systems such as 
FreeSource take hold, with infinite varieties of techniques to conceal traffic, its origins and destination. 
Furthermore, privacy rules imposed upon carriers also bind content hosts, so it is sometimes a matter of 
initiating litigation merely to require the network operator to lawfully release confidential client information 
for the purpose of locating the alleged offender(s). 

The debate over "deep-linking" will likely be resolved under current copyright litigation, with the web site 
MP3Board initiating legal action in California to seek a declaration that deep-linking does not constitute a 
re-publication for the purposes of copyright law. Most legal commentators, while acknowledging the issue 
is untested in significant jurisdictions, have guessed that the degrees of separation possible with 
hyperlinking (to the file, to the page, to the directory, to the computer, to the network) mean that the line is 
probably drawn at the level where a link results in a browser downloading the particular copyright 
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infringement. As copyright violation is both a civil tort and a criminal act, an element of "mens rea" or 
"guilty intent" is required to obtain a conviction, leading to the conclusion that a link to a particular file 
demonstrates an intent to violate IP, whilst a link to a page may not. 

7. Privacy 

One of EFA's chief concerns relates to online privacy, and the extent to which connected networks can lead 
to aggregation of databases and data mining. Whilst generally EFA is of the view that existing laws 
adequately regulate online activity, privacy is a special case because of the magnitude of intrusion 
occasioned by global access and computer-assisted search facilities. The recent furore over the Crimenet 
site is one such instance, the concerns about the Packer private database company Acxiom another. Obvious 
examples of problems with aggregated databases include instances of two separate databases being 
combined and cross-referenced to the potential detriment of those listed therein - such as the example where 
a health insurance company purchases a hospital. Similarly, the sale by the Australian Tax Office or the 
Australian Electoral Commission of information obtained under compulsion of law in a form ready-made 
for spamming challenges notions of fair use of gathered data and individuals' rights to be left alone. 

At present, the Federal Privacy Bill is before Parliament and has been strongly criticised as a legitimization 
of improper privacy intrusions and a weak Bill incapable of providing a proper framework for rights of 
privacy in the digital age. Like content regulation, the Bill proposes that the principal means of enforcing 
privacy will be left to industry codes of practice - unlike content regulation there is no mechanism for 
proactive Government action against transgressors nor effective remedies against proven violators. By 
contrast, the Bill presently before the Victorian Parliament goes much further in establishing a conciliation 
and arbitration process, defining and protecting certain privacy rights, and establishing a right to sue for 
damages for breach of privacy. 

It has been suggested that the present Government, which last year rejected any form of privacy rights 
enforceable within the private sector, was persuaded to bring in a weak privacy law for fear its total absence 
would lead to trade sanctions and e-commerce being stifled. The privacy-conscious European Union has 
pressured countries such as the United States and Australia to adopt privacy laws in order to continue to do 
business with its member nations. It remains to be seen as to whether the Federal Bill will satisfy the 
concerns of international trade and e-commerce, but in any event I'd predict that the awareness of the 
erosion of privacy made possible and fast by interconnected computer networks will eventually result in a 
desire by ordinary Australians for rights of privacy enforceable against those who would profit from the 
trade in private information. 

8. Illegal Content 

While the Internet censorship debate in this country has focused on pornography, it should not be ignored 
that the borderless Internet also frustrates attempts to regulate information of other kinds. The benign 
network effect of "dis-intermediation" (cutting out th emiddle-man) which permits such socially-useful 
material as online professional databases also permits unqualified opinion to flourish next to traditional and 
regulated sources of information. 

Recently concerns about so-called "doc.coms" giving unqualified or controversial medical advice online can 
be translated to many other subjects. The Australian Government agency ASIC recently prosecuted the 
owner of a site "Chimes" which gave investment advice of various kinds, on the basis it was not a licenced 
investment advisor. Similar sites challenge regulated, expert opionion in many regulated professions, and 
even within regulation at a national level, cross-jurisdictional issues afflict attempts to control the free flow 
of information on the global Internet. Is a Bangkok medical clinic web site going to be shut down at the 
request of the Australian Medical Association? 

Many issues are cultural - French laws prohibit sites selling aspirin or Nazi memorabilia but tolerate radical 
politics or the advertising of tobacco products, Dutch law tolerates cannabis advertising, American law 
tolerates so-called "hate sites". In a global medium, it is difficult for any country to impose its views on its 


72 


Current Legal Issues 


AUUG2K - Enterprise Security, Enterprise Linux 


neighbours. While the availability of potentially-dangerous information (such as fireworks recipes) in 
offline media is for some countries reason enough not to seek an online ban, others argue that the Internet 
should be more severely regulated than public libraries because children allegedly have greater ease of 
access than in an offline environment. Other countries strive for consistency with offline restrictions under 
the assumption that because online/offline standards should be the same, they will be! 


Conclusion 

For a content provider, the issues discussed above are fundamental as to liability for criminal prosecution, 
civil liability and whether an e-business is permitted, and as such is of immediate professional interest. For a 
network adminstrator, the issues of concern are more likely to impact via a secondary liability for content 
hosted or transmitted, or alternatively the vicarious liability for the conduct of one’s employees. An 
employer is liable for the conduct of employees to third parties, even if that conduct was not authorised or 
permitted if proper monitoring procedures were not put in place. 

Ultimately the challenge of participation in a global network is manifesting in the erosion of influence of 
national authorities in favour of international ones, notwithstanding that attacks and litigation from afar are 
undoubtedly greater risks. While the public, worldwide, comes to terms with the new sources of information 
and other content, the ability and the will of national Governments to continue regulation of various types of 
behaviour is reducing at the same time as individuals are becoming capable of suing citizens of other 
countries. As one of the world's oldest professions, law is now confronting the challenge of adapting to a 
global online future as well as other older professions, or risking irrelevance where the laws of different 
nations substantially conflict. 


Kimberley James Heitman, 

B.Juris, Lib, AACS 

Barrister and Solicitor, Western Australia 
Chairman, Electronic Frontiers Australia Inc 
http://www.efa.org.au 
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Point-to-Point Internet Connections under 
Linux 

Paul Mackerras 
Linuxcare, Inc. 


1. Abstract. 

Linux provides a robust and featureful PPP implementation which can be used to implement 
point-to-point internet connections. Currently there are separate independent implementations for 
different communications media such as asynchronous modems, ISDN connections, WAN 
adaptors, etc. This paper describes a new generic PPP layer which is intended to provide a way to 
unify the disparate PPP implementations as well as providing new functionality, such as the ability 
to combine multiple asynchronous links, and potentially other kinds of links as well, into a single 
logical PPP connection with increased bandwidth and reduced latency. 

2. Introduction. 

Over the past 5 to 10 years, we have seen an explosion in the number of systems connected to the 
internet. Many of these systems are in homes or in small offices, and connect to the internet via a 
point-to-point link (internet service provider). The point-to-point link is usually a dial-up modem, 
sometimes an ISDN connection, and will almost always use PPP, the Point-to-Point Protocol. PPP 
is also used in many other situations where internet traffic is to be carried over a point-to-point link, 
for example, it can be used over optical fibres running at gigabits per second. 

PPP is actually a collection of protocols, defined by the Internet Engineering Task Force (IETF). 
The core PPP protocols are now Internet standards [1,2]. Other PPP protocols are defined in RFC 
(Request For Comments) documents (e.g. [3-5]). The PPP protocols primarily define: 

• how network datagrams should be formatted for transmission over various kinds of 
point-to-point communications links, and 

• protocols to allow two systems connected by a communication link to agree on various 
aspects of the link operation, such as authentication, IP addresses, compression, and 
encryption. 

PPP is designed specifically for the situation where two systems are communicating over a 
point-to-point link. The term ‘peer’ is used, when talking about one of the systems, to mean the 
system at the other end of the link. As the term ‘peer’ indicates, there is no client-server relationship 
required by the PPP protocol specifications, although in practice the peers may have a client-server 
relationship, particularly with respect to how the communications link is established. The process 
of establishing the link is not covered in the PPP protocol specifications. In the typical case this will 
involve sending commands to a modem to ask it to dial an ISP. 

PPP assumes that the communications link is bidirectional and that it can pass 8 bits per byte, 
although it does not always have to be completely transparent. For example, the PPP encapsulation 
for asynchronous serial links has the capability to avoid sending specific control codes by sending a 
two-byte sequence instead. 
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The major PPP protocols include: 

• LCP, the Link Control Protocol. LCP is used to control the overall operation and termination 
of the link, and allows the peers to negotiate things such as what form of authentication (if 
any) is to be used, and optional variations to the encapsulation and framing on the link. 

• PAP and CHAP, the Password Authentication Protocol and the Challenge Handshake 
Authentication Protocol, are used to allow a system to authenticate its peer. This 
authentication can occur in either or both directions. 

• IPCP, the Internet Protocol Control Protocol, is used to allow the peers to agree that they will 
transmit IP datagrams over the link, and on the IP addresses to use for each end of the link. 

• CCP, the Compression Control Protocol, is used to allow the peers to agree on the use of 
packet compression, and on the algorithm and parameters to be used for packet compression 
for each direction of the link. 

3. PPP in Linux. 

The 2.2.x versions of the Linux kernel include several different PPP implementations, designed for 
operation over various kinds of links. These include: 

• async-ppp: PPP for asynchronous serial ports (tty devices) as well as some types of 
synchronous serial ports. 

• isdn-ppp: PPP over ISDN (Integrated Services Digital Network) lines. 

• sync-ppp: PPP over some kinds of synchronous serial WAN (wide area network) adaptors. 

• Drivers for intelligent WAN adaptors which do PPP in firmware. 

Clearly, different hardware devices need different software drivers. Nevertheless, there is 
considerable unnecessary duplication of code and functionality between these different 
implementations. 

Starting with the 2.3.x Linux kernel series, the kernel contains a generic PPP layer which provides 
an opportunity to unify all of the different PPP kernel drivers (with the exception of the drivers for 
the cards which do PPP in firmware). The generic PPP layer is described below in section 5. 

All of the PPP implementations listed above include enough code in the kernel to handle data 
packets completely within the kernel, without involving any user-space process. (The term ‘data 
packet’ means a packet sent or received by a network protocol such as IP or IPV6.) Control packets, 
that is the packets used to implement the PPP control protocols such as LCP, IPCP, CCP, etc., are 
handled by a user-space daemon in the async-ppp and isdn-ppp implementations. The sync-ppp 
kernel code includes LCP and IPCP negotiation code, but with limited functionality. 

It is possible to implement PPP almost entirely in a user-space process, so that all packets including 
data packets are processed in a user-space daemon. This has the advantage of requiring very little 
kernel code and of being very flexible. The major disadvantage of this approach is that the overhead 
of switching to a user-space process for every packet limits performance, particularly for 
high-speed interfaces. 

4. The Linux PPP daemon. 

The user-space daemon used with the async-ppp Linux driver is called pppd, for PPP daemon. Pppd 
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provides the control protocol negotiation (LCP, authentication, IPCP, CCP, etc.) and sets up the 
kernel PPP driver and network interface device according to the options negotiated with the peer. 
Pppd is part of a the ppp-2.x package, maintained and developed by the author, which also includes 
PPP support for several other operating systems. 

The isdn-ppp also uses a user-space daemon for handling the control protocols, called ipppd. Ipppd 
is derived from pppd with considerable modifications and extensions and is now maintained 
separately from pppd. 

Although pppd does include code to control asynchronous serial ports, e.g. to set the baud rate and 
character format, it doesn’t include any code for initializing and controlling modems using the 
Hayes AT command set. Instead, pppd has the capacity to call external programs or scripts for tasks 
such as setting up and tearing down the communications link. In the common situation of dialling 
an ISP using a modem, these external programs would issue the appropriate AT commands to the 
modem and interpret the responses. 

The ppp-2.x package includes a program called ‘chat’, which can execute simple expect-send 
scripts. Chat is sufficient for controlling modems in many cases, although it doesn’t provide much 
error recovery. 

An alternative is for pppd to be invoked once the communications link has been set up. There are 
several modem dialler programs available which can invoke pppd once the modem has dialled and 
connected to the peer. Wvdial (available at http://www.worldvisions.ca) is one example. Kppp, a 
graphical user interface for pppd which comes as part of the KDE desktop suite 
(http://www.kde.org), also includes a modem dialler. 

The disadvantage of invoking pppd from within a modem dialler program is that it makes it 
difficult to implement dial-on-demand, where the connection is only established once there is some 
data traffic to be sent. Since pppd controls the network interface unit, it is in the best position to 
know when there is data traffic and hence when the modem should be told to dial. 

Each pppd process controls one PPP connection and one network interface unit. Normally pppd 
would be invoked, either from the command line or by another process, when a particular PPP 
connection is to be established. Pppd does not itself provide a graphical user interface (GUI); 
instead there are several GUIs available which provide a graphical front-end to pppd. 

Pppd configuration is controlled by options files and secrets files. The options files basically list 
command-line options, which act similarly whether they are specified in an options file or on the 
command line. Pppd first reads the /etc/ppp/options file, then the user’s ~/.ppprc file if it exists, then 
a device-specific options file from /etc/ppp, and lastly parses the command-line options specified 
by the user. 

Since the act of bringing up a PPP link involves adding a network interface to the system and 
changing the system’s routing configuration, it is an operation which should only be able to be done 
by root or in configurations approved by the system administrator. This is enforced in general by 
requiring the peer to authenticate itself according to secrets (passwords) in the secrets files 
pap-secrets and chap-secrets in the /etc/ppp directory, which is controlled by the system 
administator. The secrets files contain fields which specify which IP address(es) each authenticated 
peer may use. 

Several of pppd’s options are privileged, for example the ‘noauth’ option, which tells pppd to allow 
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the peer to use any IP address without authenticating itself. Privileged options can be specified in 
any of the options files controlled by the system administrator (basically those under /etc/ppp), but 
can only be specified on the command line if pppd is being run by root. 

One way in which the system administrator can set up a configuration which allows non-privileged 
users to use privileged options in a controlled fashion is by creating options files under the 
/etc/ppp/peers directory. These options files can contain privileged options such as the noauth 
option along with settings such as the tty port and connect script to use, which cannot then be 
overridden by the user on the command line. The user invokes these options files using the ‘call’ 
option. 

An an exemption to simplify the common case of calling an ISP, the rule requiring all peers to 
authenticate themselves has been relaxed, so that by default that a peer that does not authenticate 
itself can use any IP address as long as the system does not already have a route to that IP address. 
Thus, on a home system which has no other route to the internet generally, and thus no default 
route, a non-privileged user can use pppd to connect to an ISP without having to set up a suitable 
configuration file under /etc/ppp/peers. 

Pppd can be used for both dial-out and dial-in applications. For dial-out applications, typically 
either a configuration file under /etc/ppp/peers would be used, with pppd invoked from the 
command line, or one of the front-end GUIs would be used. For dial-in, the mgetty utility has an 
option to invoke pppd when a PPP frame is seen on an incoming call. 

5. New kernel driver structure. 

The async-ppp kernel driver in the 2.2.x Linux kernel series is implemented as a tty line discipline 
connected to a network device interface, as shown in figure 1. Although this structure works well in 
simple cases, it has some limitations and it makes it hard to share code with the other PPP 
implementations. Its chief limitation is that it makes it hard to implement multilink PPP, in which 
several communications links can be bound together into a single logical PPP link. Multilink PPP 
provides a way to obtain higher bandwidth and lower latency than can be provided by the 
individual links. 

These considerations motivated the development of the structure illustrated in figure 2, in which 
there is a generic PPP module which handles all of the details that are common to all PPP 
implementations, such as: 

• the network interface unit and the interface to the networking code 

• multilink PPP, i.e. splitting datagrams between multiple links, and ordering and combining 
received fragments 

• the interface to pppd, via a /dev/ppp character device 

• packet compression and TCP header compression 

• detecting network traffic for demand dialling and for idle timeouts 

The generic PPP layer interfaces on its lower side to PPP ‘channels’, which provide the 
encapsulation and framing required by specific kinds of communication links. The channel 
interface is deliberately very simple; it basically just provides for sending and receiving PPP 
packets stored in sk_buff (socket buffer) structures. 

When a channel receives a complete PPP frame, it calls the ppp_input() procedure in the generic 
layer to process the frame. Conversely, the channel provides a transmit function to the generic layer 
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which the generic layer will call when it has a frame to be sent. The channel has the option of 
rejecting the frame for flow-control reasons. In this case the channel should call the 
ppp_output_wakeup() function at a later time when it can accept frames again, and the generic layer 
will then attempt to retransmit the rejected frame(s) again. 

Currently there are two types of PPP channel implemented. The ‘ppp_async’ channel provides the 
encapsulation and framing required for asynchronous serial ports. It connects to the serial port as a 
line discipline, as did the old async-ppp code. Thus the ppp generic layer plus the ppp_async 
channel replaces the async-ppp code. The ‘ppp_synctty’ channel supports synchronous serial 
devices that have a tty-style interface. It is also implemented as a line discipline. 

Connecting a channel to the ppp generic layer is initiated from the channel code, rather than from 
the generic layer. The channel is expected to have some way for a user-level process to control it 
independently of the ppp generic layer. For example, with the ppp_async channel, this is provided 
by the file descriptor to the serial port. Setting this tty to the PPP line discipline creates an instance 
of a ppp_async channel. 

When a new type of channel is added, it will usually require some code in a user-space process to 
go through the process of initializing the channel and connecting it to the peer so that PPP 
negotiations can begin. This is analogous to the process of sending commands to an asynchronous 
modem to get it to dial the peer, and then going through whatever dialog with the peer is required to 
get it to invoke PPP service. 

Currently pppd has a ‘plugin’ feature which allows the system administrator to add code at runtime 
to implement new features (this is implemented using shared libraries and the dlopen() call). Pppd 
is being restructured to make it possible to add support for other types of channel using a plugin. 
This will make it possible to use pppd for applications such as L2TP (Layer 2 Tunneling Protocol) 
and PPP over ethernet, which is used in DSL (Digital Subscriber Line) applications. 

The new architecture makes it possible to implement PPP multilink in a natural and straightforward 
way, simply by allowing more than one channel to be linked to each ppp network interface unit. 

The changes to pppd to support multilink are still under development. Essentially there will be one 
instance of pppd for the bundle (the logical link, the concatenation of the individual physical links) 
and one for each physical link. When a pppd is negotiating with a peer and discovers that its link is 
to form part of an existing bundle, it will join the link to the bundle and inform the pppd for the 
bundle. The pppd for the link will continue to exist while that link is connected; it is needed to 
handle any ongoing LCP interactions with the peer, such as occur if the peer requests the link to be 
terminated. 

6. Future work. 

A future goal is to restructure the Linux isdn-ppp and sync-ppp subsystems to use the ppp generic 
layer. Doing so will reduce the duplication of code, and have the side benefit that features added or 
bugs fixed will be able to benefit all of the PPP implementations. 

The pppd support for multilink and for supporting diverse channel types through plugins is 
underway and should be completed shortly. 

In future pppd will provide for graphical user interfaces or other utilities to connect to it via a 
socket and obtain status information and control the link (subject to appropriate authentication). 
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1 Introduction 

Apache is the world’s most popular web server, and one of the foremost examples of open source collaboration. 
By various estimates, Apache and its relatives run about half the web sites in the world, and millions of 
copies have shipped on CDs. One could say that Apache 1.3 has become about as good as it can be as the 
workhorse of the web: balancing flexibility with some hardcoded structure, reliability and performance. 

Apache is directly descended from the NCSA httpd, one of the earliest web servers. A lot has changed since 
1994, and the web is by any measure thousands of times larger. The structure of Apache is still basically 
the same: 

• An httpd process accepts a TCP network connection from a client such as a web browser 

• The server reads an HTTP request 

• The URL maps into a filename, and from this a MIME content type 

• Access rules per username or IP are applied, from either the global server configuration file or a per- 
directory .htaccess file 

• Either the file is returned direct to the client, or the request is passed to a handler module to generate 
dynamic content 

• Finally, a log entry is written out 

For the 18 months, the Apache core team has been engaged in a project to rewrite the server into what will 
become 2.0. Apache 2.0 is a long-awaited revamp of the code to generalize it and establish a foundation for 
future developments. This unofficial paper covers the motivation, details and implications of a few of the 
most significant developments. 


2 Portability 

Apache is one of the most widely-ported packages, open source or otherwise. Supported platforms include 

• Unix variants including Linux, BSD, Solaris, Tru64, AIX, Solaris, A/UX, SCO, HP/UX, IRIX, OSF1, 
Dynix, SINIX, Ultrix, ConvexOS, Tandem NonStop Unix, Pyramid, and NeXT 

• Windows NT, Windows 2000, and Windows 95 

• Apple Rhapsody and Darwin 

• Embedded OSs such as LynxOS and QNX 

• The GNU HURD 

1 <http://linuxcare.com.au/people/mbp> 


What's New in Apache 2.0 


91 


OS/2 

Siemens BS2000 


AUUG2K - Enterprise Security, Enterprise Linux 




• IBM OS/390 

This is a remarkably diverse range: some are EBCDIC, some ASCII and some UNICODE; some are com¬ 
pletely “un-unixy”; some ancient and some barely in beta. 

Supporting such a menagerie has come at a cost to the simplicity of the source code, which was originally 
intended to work well on standard Unix systems. One impact is that the code makes extensive use of 
conditional compilation to cope with platform idiosyncrasies. Writing to a standard POSIX API is also 
undesirable on some platforms which provide sub-standard implementations or faster paths. 

A team of developers lead by Ryan Bloom has been developing a clean solution to these problems: a layer 
called the Apache Portable Runtime , or APR, which presents a standard programming interface for server 
applications 1 . APR covers tasks such as file 10, logging, mutual exclusion, shared memory, and managing 
child processes and asynchronous 10. APR shields the application from idiosyncrasies or incompatibilities in 
the implementation of the standard, and will know the most efficient way to achieve each function on each 
supported particular platform. One component, Ralph Engelschall’s MM library 2 hides the details of setting 
up shared memory areas between processes, and provides an interface similar to malloc to manipulate them. 

The traditional Apache structure, well known to webmasters, is of a single caretaker parent process and a 
group of reusable children. The parent reads the configuration and manages the pool of children. Each child 
at any time is either serving a single request or sleeping. Apache 1.x automatically regulates the size of the 
pool of children so that there are enough to cope with spikes in load, without using too many resources to 
maintain idle processes. Children that are busy serve one request at a time on a single socket. 

This design has served very well. One of its best features is that the server can survive the death of children, 
and so it is quite reliable in the presence of bugs or leakage in the Apache code, the operating system, or 
modules. At the same time it is more efficient than the canonical Unix model of forking a new child for 
every single request. This design works well up to quite high loads on modern Unix systems. On Linux in 
particular, context switches and forking new processes are cheap, and so this simple design is nearly optimal. 
One drawback of the isolation between processes is that they cannot easily share data, and so sharing session 
data across the server takes a little work. 

Another approach is to serve each request in a separate thread: this is the model used by most NT-based web 
servers. Although this takes away most of the protection between tasks, it allows the module programmer 
more flexibility and it can be faster on systems where threads are cheaper than processes, such as Windows 
NT and AIX. 

Apache 2.0 introduces a system called MPM, which hides the process model from most of the code. At run 
time, Apache can be configured to use threads, processes, a hybrid of both, or some other model. Modules 
can register new process models to suit their operating system or the application. One proposed example is 
to fork processes that run as different users to give increased security on machines that offer virtual hosts to 
multiple customers. 


3 Faster 

Apache has always had a greater emphasis on security, correctness, and flexibility than on out-and-out speed. 
However, the amount of web traffic increases continuously, and Apache improves over time. In particular, 
people are working on solutions to improve throughput without compromising on these other qualities. 

1 <http://www.ntrnet.net/~rbb/aprpres/> 

“ <http://www.engelschall.com/sw/mm/> 
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Key performance determinants for a web server are: 

• Physical RAM to hold the document tree 

• Disk bandwidth, for reading in documents and writing log files 

• Network bandwidth to communicate with the client and perhaps with database servers or application 
servers 

• CPU cycles to parse and interpret the request, and generate the response 

• CPU cycles consumed by kernel overhead in context switching, starting and stopping processes, the 
filesystem, and the network interface 

In almost all cases after checking the configuration is basically reasonable, it’s simpler to just add more 
hardware than to complicate the setup to improve performance. However, for high-traffic web sites (or 
benchmarks!) there are several options. 

At very high utilization, the kernel overhead of switching tasks and doing 10 becomes a problem. While 
Apache’s flexibility is useful in general, a large number of requests are actually simple requests for static files 
such as images and HTML documents, and so on. These can be handled as a special faster case. 

Apache provides one solution for this through the mod_mmap_static module, which ties files into the virtual 
memory space and avoids the overhead of open and read system calls to pull them in from the filesystem. 
This can produce a speedup on the order of 20% when the server has enough RAM to cache the whole 
document tree. 

Taking this approach further, we can run a specialized web server to handle simple requests, passing every¬ 
thing else on to Apache. A proof of concept towards this approach is phhttpd 3 , the “pointy-headed httpd”. 
phhttpd serves all requests from a single process, and uses the sendfile system call to put most of the work 
back into the kernel, aside from interpreting the HTTP protocol. 

To cut out even more OS overhead, we can put a small HTTP server into the kernel itself. It would be a 
security and debugging nightmare to run all of Apache’s features in kernel space, but responding to requests 
for static files is perhaps reasonable, ktthpd' 1 implements this in Linux. Both khttpd and phhttpd can pass 
requests they don’t understand on to an underlying copy of Apache. 

4 IO Layering 

Through version 1.3, Apache modules could and did write directly to the TCP connection back to the client. 
This is a very simple and efficient arrangement, but lacks flexibility. 

Secured transactions over SSL are a case in point. To do encrypted communications, the SSL module must 
intercept ail traffic between the client and the handler module. With no abstraction layer in place this 
was a difficult task, and made more difficult by the cryptography laws of the 1990s which prohibited adding 
convenient hooks. Webmasters wanting to run secure sites had the uncomfortable choice of applying ungainly 
patch sets to the Apache source, or using a proprietary and perhaps incompatible binary distribution. 

In Apache 2 arid APR, all IO should be done through abstract IO layer objects. This allows modules to 
hook into each others streams. It should be possible for SSL to be implemented through the normal module 
interface rather than requiring special hooks. IO layers also help out internationalized sites by providing a 
standard place to do character set translation. 

3 <http://www.zabbo.net/phhttpd/> 

1 <http://www.fenrus.demon.nl/> 
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10 layers may also support one of the most-requested module features: having one module filter the output 
of another. For example, the PHP module could emit SHTML text, which would then be filtered through 
the server-side include module. (This will probably not be available to end users until Apache 2.1.) 


5 HTTP/1.1 

The next revision of the HTTP protocol is stabilizing as a standard and being adopted by software developers. 
HTTP/1.1 is somewhat more complex, but solves a number of protocol problems. 

In particular, HTTP/1.1 prefers the “chunked data encoding”. In previous versions of HTTP, the end of a 
file has been marked in general only by the end of the TCP connection. This is admirably simple, but has 
introduced some problems as the net has grown: in particular, since a lot of content is now dynamically 
generated, the client usually cannot know the content length ahead of time. Having to close and restart the 
connection slowed down consecutive transfers and made it hard to detect truncated files. 

Apache 1.3 included support for some HTTP/1.1 features, but was limited by the existing superstructure. 
10 Layers help out here too, giving Apache a cleaner way to re-encode output to suit the network protocol. 


6 What does this mean to you? 

Apache 2.0alpha was released at ApacheCon2000 in March: adventurous people can download it and try it 
out right away. But the source code has changed substantially, so you should be careful about using it on 
critical web sites. 

Webmasters, authors, and scripting application developers can anticipate a pretty smooth transition to 2.0: 
configuration files should be compatible or nearly so. You should think about upgrading in late 2000. 

One item of note for sysadmins is that Apache 2.0 uses the GNU autoconf tool for configuration, bringing it 
in line with most free software and cleaning up some configuration wrinkles. 

People running Apache on Win32 make an exception to this rule: the new threading code is supposed to be 
much better than the Apachel.3 system, and so the 2.0alpha compares favourably to released 1.3. 

Module developers will have more to do. Modules must be rewritten to be threadsafe and to use the new 
module interface. As a reward for this work, most portability and configuration issues will be handled by 
APR, so your modules will work well on more platforms. Modules will also get the chance to hook into many 
more interesting parts of the web server: when 2.0 stabilizes, they will be able to register IO filters and new 
process models. 
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Experiences with Linux in Corporate Space 

Richard Sharpe 

NS Computer Software & Services P/L 
26-May-2000 


1 Introduction 

Has achieved a great deal of recognition in the press over the last twelve months. It has 
also achieved fame on the NASDAQ with the astronomical share price rises of RedHat 
and VA Linux, and has been brought back to earth again with recent price falls on the 
NASDAQ. 

So, everyone knows about Linux, but are any corporates actually using Linux? In this 
paper we will answer that question in the affirmative and we will discuss two Australian 
companies who are making serious use of Linux: National Credit Insurance (Brokers) Pty 
Ltd (NCI) and BRL Hardy Pty Ltd. 

We will discuss how long they have been using Linux, how they came to use Linux and 
the extent of Linux usage in these organizations. While the decision to use Linux at these 
organizations was not the result of a long and careful analysis, they have both benefited 
greatly from the decision, mainly in the area of stability of their network and reduced 
support costs. 

2 Linux at National Credit Insurance 

National Credit Insurance is an Australian Credit Insurance brokerage that has its head 
office in Adelaide, with sales offices in Brisbane, Sydney and Melbourne. Further 
expansion is contemplated around Australia. 

NCI uses Windows 9X on the desktop with standard desktop applications like MS Office. 

2.1 Linux is introduced at NCI 

NCI has used Linux since 1995 in its Adelaide office when a Lantastic network was 
replaced with a 486 running Linux. Their primary applications were: 

1. LMS, a credit limit system developed under DataFlex; and 

2. The Microsoft Office suite of tools for general office work. 

The DataFlex application ran on one system, and Lantastic was used to allow file and 
print sharing. However, NCI needed to upgrade their network to support more clients 
and allow multiple users to use the DataFlex application. A solution was needed that 
provided the necessary expansion capabilities, without costing a fortune. 

To allow the original Lantastic server to be replaced, the following had to be provided: 

• A method of running the DataFlex application on the server so it could be accessed by 
multiple users, and 
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• A method to allow PC clients running the Office suite to share file space on the 
server. 

Both of these requirements were met with Linux. The iBCS2 module for Linux allowed 
SCO UNIX applications to run under Linux, and DataFlex was available for SCO UNIX. 
In addition. Samba was a file and print server application for UNIX and Linux that 
allowed Windows PCs to share files and printers on the Samba server. In addition, Linux 
was available at a very low price, while Samba was free. 

As a result, a 486 server with SCSI disks was purchased, Linux and Samba were installed 
on it, the iBCS2 module was also installed, and the DataFlex application was installed 
under Linux. To allow the PC clients to access Samba, Microsoft’s TCP/IP stack was 
installed on all the Windows for Workgroups PCs and a terminal emulator program was 
obtained to allow users to access the DataFlex application. 

The resulting server was relatively simple to manage, ran for long periods of time without 
needing to be rebooted or attended to, and was a very cost-effective solution. However, 
before long, NCI discovered the Internet and decided they wanted access to it. Another 
Linux server was added and a dial-up connection was used to provide Internet access. 
Apache was loaded on the Internet server to provide NCI’s web site. 

NCI had been introduced to Linux. The alternative was Windows NT for the serving 
functions, and a Cisco or other router for Internet access. However, this solution would 
have been much more expensive, and would have required more expensive client PCs, as 
the DataFlex application would have to be run on each client PC. The Linux solution 
reduced costs because: 

• Linux for the 486 was an order of magnitude cheaper than Windows NT 

• There was no need for client access licenses for Samba 

• A cast-off PC could be used for the Internet access server 

That solution lasted for almost twelve months, during which time more and more users 
were added to the system. 

2.2 The Growth of Linux at NCI 

In 1996, it became evident that an upgrade of the server was needed, so NCI looked at the 
possible solutions. Again, the most cost-effective solution was a Linux solution. A dual- 
Pentium 150MHz server was purchased, also with SCSI disks, and Linux was installed. 
Samba was installed, as was the iBCS2 module and DataFlex. 

This time, more work had to be invested to get Linux to use both processors of the dual- 
Pentium server, as the 2.0.x stream of Linux that was available did not support SMP by 
default. In addition, the iBCS2 code also had to be specially compiled for SMP support. 
However, Samba required no additional work to get it going. 
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A Linux-based solution had provided NCI with a painless upgrade path, as no client 
changes were required and all the same services were provided. However, the users no 
longer complained that their jobs took a long time to complete. All-in-all NCI had been 
pleased with the upgrade. 

However, within a few month, two additional services were required. These were: 

1. Dial-up access to the DataFlex application by customers 

2. A Linux server had been installed in NCI’s Sydney branch to provide file and 
print services for PCs. Network access was required from the Sydney office. 

Both of these requirements were satisfied by adding a Stallion multi-port card to the main 
Linux server in Adelaide and providing a dial-up service that allowed a PPP link to be 
initiated from the Sydney server, as well as dial-up logons from customers. 

Again, Linux had been able to provide the required functionality without the need for 
costly additional hardware or software. 

That solution lasted for two years, until in 1998, after the server crashed because of 
hardware problems, a more redundant solution was desired. 

2.3 Linux (almost) everywhere 

In 1998, a major upgrade was undertaken. A pair of dual-Pentium Pro 200 servers were 
purchased. These servers continued to run the same software that had been run on the 
previous server, however, they were set up such that one was available as a backup to the 
other. The rsync utility was used to synchronize the files systems between the two 
servers. 

Each of these servers were quite a bit more powerful than the previous servers, consisting 
of: 

• Dual Pentium Pro 200 processors 

• 256MB of memory 

• A Mylex DAC960 tri-channel RAID controller with three 9GB SCSI disks 

• A fast Ethernet controller and a lOMbs Ethernet controller. 

The fast Ethernet link was used for synchronizing the file systems between the two 
servers, and manual procedures were used to switch the two machines around. 

At the same time that the dual-Pentium Pro servers were installed, dial-up access was 
moved to a dedicated Linux RAS server, and a dedicated Linux server was installed as a 
backup server. That is, it had tape drives and a CD burner connected to it and performed 
backups for NCI. In addition, two Linux systems were used as a firewall. 

Again, within a few months, a Linux server was installed in each of Melbourne and 
Brisbane, and a Telstra frame relay network was installed between Brisbane, Sydney, 
Melbourne and Adelaide. The Linux servers in Brisbane, Sydney and Melbourne provide 
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local file and print services, while access the frame-based WAN are provided by Telstra 
managed Cisco servers. 

Within the Adelaide office, the dual-Pentium Pro 200 servers provide file and print 
services as well as access to the DataFlex application for all users Australia wide. 

By now, there are some ten Linux servers in NCI, and Linux has proven to: 

• Be able to perform all the functions NCI has required of it 

• Be cost effective 

• Be manageable by one full-time junior employee and one part-time consultant 

• Have plenty of support avenues available 

• Be remotely manageable, so personnel in Adelaide can manage the servers in each 
state with ease. On the very few occasions that the servers in other states are not 
accessible, it has been easy to walk local staff through the procedures for recovering 
access. 

From NCI’s perspective, Linux has been a success, as it is used in preference to any other 
server platform, and provides all the services NCI needs. The only departure from Linux 
as a server OS was when an IIS server was required to allow a DataFlex-based Web 
application to be installed. The DataFlex-based Web functions only run on NT servers. 

3 Linux at BRL Hardy 

BRL Hardy is a well known Wine company with offices around Australia, as well as an 
office at Chantilly in the US and Epson in the UK. Their main office is at Reynella in 
SA, and they have offices in many other places in South Australia, mainly at wineries. 

They have a frame relay network around Australia, with some 500 PCs, 100 LapTops, 16 
NT servers and a number of AS400 minicomputers. Their electronic mail application is 
Lotus Notes. 

3.1 BRL Hardy Discovers Linux 

BRL Hardy has been using Linux for about three years. They started using Linux 
because the cost of a vendor supplied solution was prohibitive, while Linux could be 
installed on old 486 PCs and could perform as perfectly good Internet gateway systems 
and dial-up servers. 

Initially, they installed Linux on an old Olivetti 486SX with 4MB of memory as a proof 
of concept, but it worked so well that they have rolled it out across their organization. 
Their old PCs now have a new lease on life as RAS servers. 

They currently have Linux in the following locations: 

• Brisbane, Queensland Linux RAS Single port 

® Sydney, NSW Linux RAS Single port 
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Melbourne, VIC 

Linux RAS Single port 

Padthaway, SA 

Linux RAS Duel Port 

McLaren Vale, SA 

Linux RAS Single port 

Reynella, SA 

Linux Firewall,VPN/proxy/DNS 

Reynella, SA 

Linux mail relay 

Reynella, SA 

Linux Network Monitor (Big Brother) 

Reynella, SA 

Linux RAS 10 port 

Blackforrest, SA 

Linux RAS Single port 

Berri, SA 

Linux RAS Duel Port 

Clare,SA 

Linux RAS Single port 

Chanatilly, USA 

Linux Firewall/VPN/Mail gateway/proxy 

Epsom, UK 

Linux Firewall/VPN/Mail gateway/proxy 


BRL Hardy has found that Linux is ideal for their requirements, because: 

• It is cost-effective and very light on resources 

• It is reliable, and does not crash as much as competing solutions do 

• It can be managed remotely 

• It is very flexible. 

They are able to manage their network with four people in Australia and two in the UK, 
which is a very small staffing need for a large network. 

4 Conclusion 

Both of the organizations discussed in this paper used Linux because it was available and 
it looked like it would cost a whole heap less than the alternatives. However, once they 
started to use Linux, they took to it with gusto. 

They have both found Linux to be a highly cost-effective solution that can be remotely 
managed and for which support can be easily obtained. In addition, full source code is 
available for Linux and most applications that run on it. This ensures that they will never 
be left without a support path should the operating system be discontinued, and that they 
can use anyone to support their servers, should their primary support people be 
unavailable. It also means that they can often fix problems themselves or get a third party 
to do so; they don’t have to rely on expensive support services that often cannot provide a 
solution. 
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The Dynamic Duo 

by Richard Keech, Red Hat Asia-Pacific 

A popular way of using Linux in business involves a pair of systems; one as an Internet gateway, 
and a second as a mail and file server. Such a configuration is suitable as the core server and 
Internet infrastructure for many small businesses. This paper describes how such a setup can be 
implemented with Red Hat Linux. All the aspects of the server setup described in this paper are 
implemented using free software. This paper is not intented to be a complete reference for any of 
the services described. 

Why not just one system? All of these network services could be provided on a single server. Why 
use two? The short answer is that it would probably not be secure enough. A basic heuristic 
underlying the choice of a two-server architecture is ’do not put user data or user accounts on a 
system directly exposed to the Internet’. This is not to concede any lack of security in Linux; 
simply it reflects the current reality of Internet security regardless of operating system. 

Suitability. The two-server arrangement represents a middle ground with regards to security. In 
many circumstances a single gateway host would be regarded as inadequate, and elaborate 
multi-firewalled configurations would be required. A single gateway arrangement might suit many 
small- to medium-sized organisations, for whom the elaborate firewall arrangements would be a 
prohibitively expensive overkill. 

The gateway 

The gateway host needs to: 

• forward mail received from outside through to the main server; 

• forward mail received from the main server to the outside recipient; 

• provide web proxy cache functionality; 

• perform packet filtering; 

• connect to an ISP using, perhaps, dialup PPP; and 

• provide access to DNS. 

The gateway should be as stateless as possible, ie it should contain the bare minimum of data and 
should not have any user accounts defined on it. The gateway should not need frequent backups. 
Data on the gateway, such as the web proxy cache service’s cache contents are expendible, and as 
the need for backups is avoided. Important log files from the gateway can be saved to the main 
server and backed up there. 

Re-building with Kickstart. Ideally the stateless nature of the gateway means that in the event of a 
compromise it has to be capable of being re-built easily. A tool that could aid in this would be Red 
Hat’s Kickstart system. It is simple to characterise the configuration in terms of a Kickstart file, so 
that the system can be re-built or cloned or upgraded very quickly. 

The main server 

The main server needs to: 
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• send mail for external destinations via the gateway host; 

• recieve mail for the organisation; 

• serve as a POP or IMAP server for desktop PC mail clients; 

• be a file and print server to desktop PC clients; 

• provide a domain name service; and 

• provide a DHCP service. 

In a typical business context, the main server can also be used to do other functions such as: 

• fax serving; 

• Intranet serving. 

This paper will not cover fax and Intranet configuration. 


Email 

Mail functionality is in two parts; mail transport done with SMTP using Sendmail , and final mail 
delivery to the users’ mail clients is done with either IMAP or POP. For the sake of the example, 
IMAP will be used. 


Service 

Mail transport 

Package(s) 

sendmail 
(sendmail-cf) 

Daemon(s) 

sendmail 

Start up script 

sendmail 

Port(s) 

25 

Configuration 

/etc/sendmail.cf 

/etc/sendmail.cw 

/etc/mail/* 


Mail relay on the gateway host 

In addition to the default Sendmail configuration, the main thing to be configured on the gateway is 
mail relay, ie the gateway should take mail from outside, and pass it through to the main server if 
appropriate. The gateway does not require any user accounts to do this function, ie the mail 
accounts of the recipients need only be on the main server, safely behind the gateway. Mail relay is 
accomplished by configuring a mailertable as part of the gateway’s mail configuration. 

The default Red Hat mail configuration looks for a file /etc/mail/mailertable, which in the case 
of a domain gotham.com, might contain the following: 

gotham.com esmtp:[batman.gotham.com] 

This assumes the name of the main server is batman.gotham.com, which will be used as the 
example for the remainder of the paper. Sendmail doesn’t read the mailertable file directly, but 
instead reads an indexed database file version called mailer table, db. Simpy run make in the 
/etc/mail directory to build this. 
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Linuxconf. If you configure using Linuxconf then the mailertable feature is done slightly 
differently. The use of mailertable is enabled by selecting special routing database option in the 
basic sendmail configuration sheet (under the misc tab). The specific mailertable entry is 
generated with the configure special (domain) routing sheet. The equivalent mailertable entry 
would be generated by selecting add in the special routings sheet: 

Destination: gotham.com 

Forwarder: batman.gotham.com 

Mailer: esmtp 


Outbound mail through gateway 

Since the main mail server is behind a gateway and (by design) can’t access the Internet directly, 
outbound mail is directed via the gateway host by adding a smarthost directive to the mail setup on 
the main server. This is the DS directive in /etc/sendmail .cf and if you are using Linuxconf, it 
is set using the mail gateway option on the Basic sendmail configuration sheet. If our hypothetical 
gateway host is robin.gotham.com, then that name is put in the mail gateway field. 


Mail Masquerading 

Normally mail will, by default, appear to originate from the host sending it. It is usually 
appropriate to have mail appear to originate from the domain (gotham.com) not the host 
(batman.gotham.com). This is accomplished using Sendmail’s masquerading facility (not to be 
confused with IP masquerading). This is done with the DM directive in /etc/sendmail .cf, which 
is invoked in Linuxconf using the Present your system as option from the Basic sendmail 
configuration sheet. 

Access control 

By default, Sendmail with Red Hat is fussy about who it will pass on mail for. This is an anti-spam 
precaution. The standard Sendmail facility to deal with this is the file /etc/mail/access . This 
file does not need to be set on the main server. The appropriate entry in this file on the gateway host 
might look like: 

gotham.com RELAY 

Run make in the /etc/mail directory to make the change to access take effect. 

In Linuxconf, the equivalent functionality is evoked with Mail delivery system -> anti-spam filters 
-> Setting ’Relay for’ by name which puts the domain name in the file /etc/name/name_allow. 


Machine alias 

The main server needs to be told to accept mail on behalf of the designated domain. This is done 
using the file /etc/sendmail. cw with an entry as follows: 
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gotham.com 

Without this entry, the system will only accept mail for itself. The same file on the gateway host is 
not required. 

Final mail delivery 

Final mail delivery to users in our typical network will be accomplished using the IMAP protocol. 
The daemon for this service is started on demand. Both IMAP and POP support is provided in the 
imap package. Installing is simply a matter of loading the imap package, and un-commenting the 
imap line in /etc/inetd.conf, and re-starting inetd. 


Service 

IMAP 

Package(s) 

imap 

Daemon(s) 

imapd 

Start up script 

(start on demand 
by inetd) 

Port(s) 

143 

Configuration 

nil 


Web proxy cache 

A web proxy cache server will give clients on the local network access to web services, without the 
clients needing direct connection to the Internet. The most suitable location for the Web proxy 
cache is on the gateway host, since this is directely connected to the Internet. Note that the main 
documentation for squid is its configuration file, /etc/squid/squid.conf. 


Service 

Web proxy cache 

Package(s) 

squid 

Daemon(s) 

squid 

Start up script 

squid 

Port(s) 

3128 

Configuration 

/etc/squid/squid.conf 

/var/spool/squid 


Out of the box squid will allow a 100MB area for web caching under /var/spool/squid, and will use 
about 16MB to 24MB of memory. Often these values will not be suitable. The size of the disk 
cache is set with the cache_dir option in squid . conf . The amount of memory used is set with the 
cachejnem option. Note cachejnem value is proportional to, but not the same as the memory 
used. See squid documentation for more information. 

Access configuration. Out of the box, squid will only allow users on the localhost to access the 
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cache. The usual requirement is that all hosts on the local network are allowed access. To provide 
for this, create an access control list (ACL) using the ad directive, and an http_access entry. A 
suitable ACL, to be placed along with the other ad directives, might be: 

acl gotham src 192.168.0.0/255.255.255.0 

The http_access entry should be placed along with the other similar entries, but prior to the last one 
(which will probably be deny all). The new entry might be as follows: 

http_access allow gotham 

Logging issues. On a large and busy web proxy cache, the default logging regime can rapidly fill 
very large amounts of disk. The standard Red Hat log rotation scheme rotates squid’s log files 
weekly and kept for five log rotations. It might often be necessary to increase the log rotation 
frequency, say to daily, or to reduce the verbosity of squid’s logging. Log rotation frequency is 
changed by editing the file /etc/logrotate.d/squid and changing the instances of the directive 
weekly and changing it to daily. 


File server 

Configuring Linux to serve files to Windows boxes is usually done with Samba, a program which 
needs no introduction. Samba is the subject of copious other documentation, so this guide will 
simply review what is required in the gotham example. 

The context for setting up Samba for this example will be Windows98 hosts connecting using NT 
domain authentication, and getting print services, and file shares which are shared to specific 
groups of users. The PCs on this network are setup as per PC setup below. 


Service 

SMB file & print 
serving 

Package(s) 

samba 

samba-common 

samba-client 

Daemon(s) 

smbd, nmbd 

Start up script 

smb 

Port(s) 

137,138,139 

Configuration 

/etc/smb.conf 

/etc/smbusers 

/etc/smbpasswd 


While Samba does not provide complete Primary Domain Controller (PDC) capability, it is 
sufficient for Windows and NT hosts to: 

• login using NT-style domain logins; and 

• run a login batch file to load users’ preferences, desktop and start menu. 

A full description of configuring PCs for network logins and roaving profiles is provided as part of 
the Samba documentation at /usr/doc/samba-<version>/docs/textdocs/DOMAlN. txt. Most of 
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the options required to support domain logins are the default settings with Samba under Red Hat 
Linux. The additional steps recommended are to enable the sections in the smb.conf file relating to 
configuring: 


• logon script 

• logon path 

• unix password sync 


Shared folders. The file share(s) intended for the main oganisational data are best set such that 
access is based on Linux’s permission model, rather than by specifying access using the write list 
and valid users mechanisms in smb. conf . An important aspect of ensuring the painless sharing 
of data is to ensure that, where users belong in different groups, all shared directories within the 
share have the setgid bit set, ie chmod g+s directory should be run. Also, the group of the 
directories should be set appropriately to the group of people who should have access. This ensures 
that when users create files, those files will be correctly associated with the shared group. The 
create mode and directory mode options in smb.conf should be set as appropriate to make files 
and directories group-readable. 


DNS 

DNS is required for most of the network services on the network. In this example, the domain with 
which the network is associated is managed (ie delegated to) an external agency. However, if it 
were required, the gateway host could be readily configured as the primary server for the domain. 
To give DNS access to the local network, the gateway host is setup as a caching-only nameserver. 
In addition to the bind and bind-utils this requires the caching-nameserver package, which provides 
the necessary generic configuration files to make the host a caching-only server. 

Local domain. Despite the fact that the official domain for the organisation is hosted elsewhere (in 
this example), it is appropriate to configure DNS on the local network. A common way to deal 
with this is to give the local network its own generic domain identification, which applies only 
locally. This can be done by configuring the main server as being the primary server for the 
private domain (the name is arbitrary), and giving all IP addresses in the local network the 
neccesary forward and reverse entries in the files in the directory /var/named/. The gateway can 
then be made aware of the private domain either by having the gateway resolve its addresses using 
the main server (instead of itself), or the gateway host can resolve from itself and be a slave server 
for the private domain. 


Service 

DNS 

Package(s) 

bind 

bind-utils 

caching-nameserver 

Daemon(s) 

named 

Start up script 

named 

Port(s) 

53 

Configuration 

/etc/named.conf 

/var/named/* 
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Packet filtering 

Packet filtering is an important part of system security and should be configured on the gateway 
host. The current way of doing packet filtering is the ipchains kernel facility, and the associated 
ipchains package. The suggested heuristic to apply using the filters is to only allow packets 
associated with services known to be required, and only then if other criteria are met. 

rkfw. A convenient way of dealing with packet filtering in the simple example situation described 
here, is to use the rkfw facility, which makes configuring the firewall very simple. Using rkfw 
allows the firewall capability to be managed as a service in a runlevel, which can be started or 
stopped in the fashion of any SysV-style of service, rkfw is not provided with Red Hat Linux, but is 
freely available by request from the author, rkfw assumes a simple two-port gateway arrangement, 
and configuration involves simply specifying the allowed inwards services in the file 
/etc/rkfw. conf . IP Masquerading is enabled automatically by rkfw. 

masquerading modules. IP masquerading provides outgoing connections through the gateway to be 
established. Packets can be sent in both directions, but the originator of the connection cannot be 
outside the gateway. This works fine for simple protocols like telnet, but not for more complex 
protocols like ftp. IP masquerading can be made to work with some of these more complex 
protocols through the use of additional kernel modules in /lib/moduies/</cernei 
version>/ ipv4 such as ip_masqjtp. A good way to have these kernel modules load when 
required is to make another module dependent upon them in the modules configuration file 
/etc/conf .modules as follows: 

add above ppp ip_masq_ftp 


Security 

User accounts. In the gotham network example, the end users of the services are on PCs, and do 
not need interactive shells. Accordingly, accounts should be made with no shell, ie the designated 
shell should be /bin/false. 

ssh. Remote access to and from the network should be done using ssh, which is a more secure 
alternative to ftp and telnet, ssh for Red Hat Linux can be got from http://www.zedz.net/. ssh is not 
Open Source software, openssh is a completely free alternative, also available from zedz.net. 

Prune. The configuration of both systems should be pruned so that only the services known to be 
required and actually configured and available. Candidates for removal here are telnet, ftp, finger, 
ident, linuxconf (ie the network service that allows web-based configuration using Linuxconf). All 
these correspond to entries in /etc/inetd.conf . Make sure the default run level is only defined 
to have the services specifically known to be required. 

Apply tighter security policy. The security policy for tcpwrappers should be set to be ’mostly 
closed’, ie only access where it is specifically required. This is implemented in /etc/hosts. allow 
and /etc/hosts. deny . Set /etc/hosts. deny to have ALL:ALL in it, thereby denying all access 
(to services controlled by tcpwrappers) unless specifically allowed in /etc/hosts.allow. 
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Apply errata. Aside from keeping the version of the OS up to date, keep an eye on errata as it is 
issued by subscribing to Red Hat’s announce list (rehdat-announce-list@redhat.com). Red Hat 
Linux doesn’t have a concept of service packs or patch updates like systems such as Solaris. If part 
of Red Hat Linux is updated through the errata system, the requisite package(s) is/are re-issued in 
their entirety. With access to all the packages in the errata, the easiest way to apply the right ones is 
to use the rpm command with the freshen flag (-F), so that no additional packages are inadvertantly 
added in the update process. 

Configure NTP. In the (hopefully) unlikely event that you’ll have to trace the steps of a hacker in 
your system, accurate time stamps will help you compare if you need to cross-reference with 
administrators at ISPs or other facilities. Synchronising your system clock using NTP is useful for 
this. See the xntp3 package and the associated documentation for more information. 


PPP 

Connecting the gateway host to the Internet is typically using PPP. Configuring PPP is easily 
performed with Linuxconf under 

Config -> Networking -> Client tasks -> PPP. Alternatively, the Red Hat PPP configuration tool, 
RP3 can be used. The PPP interface should be configured to connect on boot, and persist, ie 
attempt to re-connect automatically if the link cuts out. 

In Australia, Telstra’s Big Pond Direct service is an efficient, economical way to have a dialup 
permanent service, suitable for small business. A permanent modem service currently costs $500 
setup, and $0.12/cached-MB and $0.19/non-cached-MB. This service also includes secondary 
DNS, secondary MX, and a small number of fixed IP addresses. 

PEERDNS. Upon connection from the gateway to the ISP via PPP, by default Red Hat provides for 
the resolver configuration to be changed to point to the ISP’s DNS server. Where the connecting 
host is a DNS server in its own right, this resolver configuration is not appropriate, ie the resolver 
on the gateway would probably point to the DNS server also running on the gateway. 

To ensure that the resolver configuration is not changed when the PPP connects, set 
PEERDNS="no" in /etc/sysconfig/network-scripts/ifcfg-pppO. 

PC setup 

There are a number of Windows PC client configurations suitable for use with Linux, but for the 
purpose of the example given here, the following Windows 98 configuration of networking 
components is suggested: 

• client for Microsoft Networks installed; 

• TCP/IP protocol installed; 

• DNS configured (DNS server configuration will be provided by DHCP); 

• Set to get IP addresses using DHCP; 

• Under network configuration, the PC must set to use user-level security; 

• PC’s identification should be set to use domain authentication from the designated domain; 
and 

• WINS needs to be configured. 

Web. The Web browser on the PC should be set to use a proxy cache on 
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batman, gotham.com.au: 3128. 

Email. The PC needs to be configured to access mail via IMAP. Netscape, Outlook or Eudora 
would be suitable. 
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Linux and Open Source Case Study: Less Talk, 
More Action 


By: Con Zymaris 
From: Cybersource 
Email: conz@cyber.com.au 

Synopsis: 

Linux and Open Source have moved past the advocacy and experimentation stage for many firms, 
and into the rollout stage. This paper looks at the implementation details, the hitches, travails and 
triumphs of several major Linux and open source projects. Both infrastructure and development 
projects will be covered. 


Note: The projects and organisations that we will cover in this paper range from small to medium 
enterprise through to large multinationals. Wherever Cybersource has been given explicit 
permission to divulge further details and the names, we will do so. 


Network Servers for The Centre for Molecular Biology, Epworth 
Hospital, Richmond. 


Who: CMBM are a scientific research organisation, with around 40 staff. They operate a mixture 
of Windows 9x, NT, OS/2 and MacOS based workstations. CMBM performs cutting edge research 
on several biological processes, which is recognised world-wide. CMBM also has some specific 
requirements, such as the speedy scanning, transmission and storage of very large (tens of 
megabytes) medical image files. 

What: We were called in to help with their wayward NT-based file, print and mail server. CMBM 
had been having ongoing stability problems. At one stage, their NT server would need weekly 
re-boots. 

Why: We proposed a Linux based replacement server, running on equivalent hardware. Our 
suggestion to the Client was that we could replace the existing server, service-by-service, running 
both old and new servers in tendem for some time. 

How: 

File Sen’ing 

For file serving, we used the then current version of Samba. The process of mirroring a production 
server was quite intricate, as the scientists often worked extended hours, so taking a ’snapshot’ of 
home and work directories from old server to new was tricky. 

We used tools which work with Samba to allow the migration of NT domain passwords. This 
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allowed for a smoother than otherwise transition from one server to another, as the users did not 
have to re-submit their passwords into the Linux server upon account creation. Linux had the 
advantage here, with the flexible and powerful Pluggable Authentication Module (A Linux-PAM 
page http://www.kemel.org/pub/linux/libs/pam/), which allows Linux to use up to several dozen 
different sources for user authentication, including SMB, LDAP, RADIUS etc. 

Server Management 

The Webmin tool used as the initial front-end for the Client’s IT staff to manage the new server. 
Webmin (http://www.webmin.com/ ) is not platform specific, and provides an intuitive web 
interface for all the standard systems management requirements (adding users, checking disk 
partitions etc.) In more recent releases of Red Hat Linux, Linuxconf has become the defacto GUI 
and Web based admin tool, so now we would have an option to use either Webmin or Linuxconf. 

Tape Backup 

As would normally be expected in a site of this size, DDS 3 and DDS 4 tape backup units are used. 
We have always found the Sony units reliable under Linux. 

Hiccups: We had substantial problems with both Windows Roaming Profiles and printing. I’m 
not sure that we ever managed to successfully resolve all the roaming profile issues. Users would 
expect their standard Windows desktop to follow them wherever they logged in, which it would do 
most of the time, but on some occasions it would not, and then they would have their desktop icons 
and menu items changed irrevocably on those desktops which which didn’t ’comply.’ Grumble. 

Printing was another ongoing problem for the first little while. CMBM is an organisation which 
creates a high volume of large print-jobs. Some of these were spooled through the SMB printing 
service of the Linux/Samba server, while others were fed directly to a myriad of networked laser 
and high quality AO plotter-sized devices. Printer queue problems were a weekly or fortnightly 
occurrence, which were only resolved, by accident on our part, through the eventual 
decommissioning of the NT server. 


Result: The end result of this migration to Linux. In over two years of running Linux on a couple 
of different servers, not a single unscheduled minute of downtime. A vast change from weekly 
NT problems. 


Application Servers for IVECO/International Trucks, Dandenong. 

Who: With nearly 600 employees, 90 dealer franchises and an annual revenue around $250 
million, Iveco is perhaps Australia’s biggest manufacturers of haulage vehicles. It has multiple 
offices in Melbourne, offices in Sydney, Adelaide, Brisbane and New Zealand. It is also a 
subsidiary of the multinational Iveco Group, the truck manufacturing unit of automobile giant Fiat. 

What: After nearly decade of using a WANG VS based mini system, to operate its core business 
software, Iveco decided that Unix was the best replacement platform for the aging Wang system. 
And out of the Unix variants, Iveco’s IT group decided on Linux. 
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Why: Due to the ease of porting the large volume of existing legacy code to Linux, and the cost 
advantages of Linux, Iveco decided to work on a Linux pilot project, which we were involved with. 

How: 

Linux as an Application Server 

The bulk of the migration work, was in transferring the code from the Wang VS-COBOL to a 
Linux-based ACUCOBOL environment. With minimal re-working of some VS-COBOL specific 
features, Iveco had a fully functional equivalent system running on Linux, in just a few months. 

By rolling out a pilot project, Iveco’s IT management were able to ’trial’ a Linux solution, and 
proceed only if they found it acceptable. In the end, not only did Linux live up to expectations, but 
surpassed them handsomly. Iveco gives the example of over five months of 100% uptime on their 
Laverton, Victoria server as part of the reason they gave the go-ahead for full deployment. 

Linux of the Desktop 

Linux was also considered as a replacement for Windows-based terminals on staff desktops, 
running a terminal emulator to access the core business applications. 

Linux moves further 

As it often happens in many firms, Linux and open source software gain a foothold, IT managers 
see that they work and work well, then consider them for other duties. So it is with Iveco. After the 
successful rollout of the Linux based application servers running the core business software, Iveco 
management then considered, and opted to use Linux for corporate email services, proxying and 
file/print serving. 

Hiccups: As with many Client sites, there are issues with roaming profiles under Windows. This 
seems to be a problem under either NT or Linux/Samba. 

One Linux-specific hiccup encountered is an incompatibility with Seagate DDS 4 tape drive units. 

Result: Another high-profile site, with forward thinking IT management who recognise a good 
thing when they see it, being rewarded by workhorse open source software. 


Network Infrastructure Servers for Mannesmann VDO Australia, 
Heidelberg West 


Who: VDO is one of the largest specialised parts manufacturers for the automotive industry, 
Worldwide. It has a network of several hundred systems, running a combination of Unix, Windows, 
Linux and FreeBSD. 

What: VDO needed specific network infrastructure solutions, to which our staff recommended and 
helped deploy Linux and FreeBSD systems. 

Why: Among the reasons suggested as advantages to use open source solutions to these specific 
requirements was the lack of a need to acquire more hardware, as legacy hardware could be 
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redeployed. Also, no new licences for NT or Unix would need to be purchased. Introduction of 
open source servers was gradual. They had to prove themselves, first in roles which were not 
considered critical for normal network operation, then increasingly as vehicles to replace aging or 
legacy servers and finally because they were the best solution, period. 

How: 

CD Rom Servers 

Linux was used as the basis of network-mounted file server jukeboxes for the Windows and 
NT workstations on the VDO network. Due to the read-only nature of the CDRoms, permission 
issues and integration into the NT domain mechanism were simpler. 

Adabas Database Server 

VDO operated a mainframe-class Pyramid Nile Unix host. One of the purposes of this was to serve 
an ADABAS database. When it came time to consider decommissioning this host, Linux was 
chosen as the platform on which to run a native version of the ADABAS database server. Minimal 
effort was required in swapping in the Linux server; the client software packages just run as they 
would have against the Nile host. 

Firewalls 

Along with standard internal services, open source operating systems and firewall packages were 
used as part of the construction of the VDO firewall. Among the reasons for this were the high 
quality and level of functionality of the stateful packet filtering tools available with FreeBSD, as 
well as the oft-quoted advantages of no-hidden-code, especially in security related software. 


Hiccups: None really 


Result: One of the quotes that I like to invoke when I talk to IT managers about Linux and open 
source, comes from the former network manager for VDO, Ron Fabre, who, when asked about how 
he found the increasing number of Linux and FreeBSD servers in his organisation, answered by 
stating that if he had to re-create his network infrastructure again, he would use Linux and 
FreeBSD servers almost exclusively. 


Application Server and Telemetry Devices for Joint-venture 
Company of Major International Petroleum Corporation 

Who: A high-tech company which develops World-leading system management software for the 
pumping hardware and telemetry used by the petroleum and bulk-fuel industry at fuel terminals. 
With multiple offices throughout Australasia, this company has been in this line of business for 
over a dozen years. 

What: Linux was selected by senior management as the best platform to deploy the new server 
version of their existing major application. This application has been on various proprietary Unix 
platforms for over a decade, and consists of over 600,000 lines of C code. It also used a proprietary 
database server. Prior to our efforts, the internal developers had been at work porting all the code 
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to Linux and the GNU C Compiler, as well as developing a wrapper for the existing proprietary 
database calls to the open source PostgreSQL RDBMS (www.postgresql.org). Our team was tasked 
with the completion of a TCP/IP communications layer and protocol which will connect the Linux 
server system to various clients. 

Why: In a rare change from the norm, the senior management at this company selected Linux over 
a proprietary non-Unix solution, and had to sell the idea to their techs. Among the reasons that 
management gave for selecting Linux over competitors such as NT was that this was software for 
mission-critical, 24 x 7 operational use, with substantial cost issues involved with any downtime. 
For this reason, they discounted an NT based solution. They also felt that with Linux’s growing 
reputation as a rock-solid platform, they could use this to their advantage from a marketing 
perspective. Other reasons to use Linux were the large number of high quality development tools, 
as well as the extremely long life-spans of these development tools. Quite often, development tools 
on the Windows platform are deprecated every 2-3 years, whereas this firm maintained operational 
systems for large clients, built on tools with decade-long or more life-spans, like GCC and GDB. 
PostgreSQL was chosen due to its full-featured nature, and zero licence costs. 

How: 

Development Environment 

The development systems we used on Client site and back at the office, were fairly standard Red 
Hat and Mandrake-based Linux workstations and servers. Some of the developers preferred the 
GUI approach that tools like KDevelop offered (automated function and class referencing etc.) and 
as KDevelop was installed by default with a Mandrake workstation install, we used Mandrake for 
those developers. Installing KDevelop by hand required the tracking down of half a dozen disparate 
RPMs 

Among the issues we faced with development on a project this large (600+KLOC in over 300 
source files) was long build times. On Pentium 450 MHz systems with 128 MB of Ram, a full 45 
minutes was needed for a make clean full build. 


Deployment Details 

Besides the main application server running Linux, the overall software package offered by this 
company for petrochemical clients requires the use of dozens of telemetry devices, all based on 
single-board computers running Linux on flash-card technology. This chameleon-like ability of 
open source OSes to become whatever users wanted of them, was something that we would see 
time and time again. 

Hiccups: One of the few non-trivial problems the development group as a whole hit with this 
project was in the migration from the proprietary database to PostgreSQL. As part of the legacy 
code, the database routines that had been created using non-relational querying and data retrieval 
techniques. While these worked well with the previous (psuedo-relational) database, performance 
suffered under PostgreSQL. In short, the previous database needed to be re-commissioned, and the 
code which forms the bulk of the database layer for the application needed to be re-written from an 
RDBMS perspective. 


Result: The fruits of this development are currently being deployed at a number of petrochemical 
bulk-handling plants in South East Asia. Linux and open source tools will thus form the basis of a 
broad range of solutions in the petrochemical industry. 
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Network Management Server and Broadband Communication 
Controllers for Large Telco and International Electronics and 
Communications Hardware Company 

Who: Both of these companies are amongst the largest in their respective business domains, 
World-wide. They joined together for this project, to deliver a high-speed, broadband fibre-optic 
network which encompasses all mainland state and federal capitals. 

What: The primary content of this network is MPEG video. As part of this project, highly 
specialised hardware and software was developed to route the various video streams around the 
country-wide network. The majority of the telemetry and management hardware used, is running 
Linux. The tools which were used for development include open source MySQL (www.mysql.com) 
databases servers and the open source PHP programming language (www.php.net) Other tools used 
were automated ssh sessions, scripted with the Expect programming language. 


Why: In a rather amusing story, senior project management at the telco recounted the story of 
how they demanded that the comms hardware company provide a Linux-based solution, or else 
they would find another supplier who would. The comms hardware supplier came back with an NT 
based solution; the telco walked. This forced the supplier to start using Linux. Further discussions 
with said senior telco staff showed that they were in awe of Linux and open source tools like Perl. 
They just loved the robustness and flexibility. They had been using these technologies to develop 
specialised network management nodes which exposed HTML interfaces for some time, and would 
not countenance going back to overpriced proprietary non-standards compliant solutions which 
would lock them into any single vendor. 


How: Each node of the broadband network housed three separate Linux systems, which were 
tasked with different roles in the overall MPEG/video translation/communication process. The 
network management side of the equation was taken care of by a specifically designed 
PHP/MySQL/Apache-based node and video configuration tool. A backup mechanism for using 
ssh/telnet to log into these servers from remote locations on the telco’s network monitoring bunker 
were also built. Command line utilities were created by the hardware supplier to control the 
specialised video compression/streaming units. These were developed in standard C++ using GCC. 
These were then scripted into action by a combination of Expect (Tel) and PHP by our team. The 
Linux servers used for the entire solution, were based on hardened, specially designed rack-mount 
units, with no moveable hardware, using flash-cards for boot-up and log storage. 


Hiccups: No real hiccups with this project. Some belt squeezing was required to get all the 
pre-requisite tools and software to fit onto the flash cards, but then this is one of the reasons they 
chose open source tools in the first place; this ability to cut and roll-you-own, makes this kind of 
solution possible. Try fitting NT, IIS, ASP, SQL-Server and a myriad of other scripting tools and 
remote communications utilities into a 20 MB flash device. 


Result: So successful was this project, that the now totally-converted-to-Linux comms hardware 
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vendor is planning on rolling out the same Linux-based solution throughout various European 
telcos. 


Internet Gateway and Mail Services for Melbourne Public Radio 
Station 


Who: This Client is the most popular and innovative public radio station in Australia, with over 21 
years of broadcasting history. 

What: The radio station staff member tasked with IT, decided that all the staff and 
presenters/programmes having their own email would be a great idea. They came to us to get Linux 
to do the job. 

Why: After having run a Windows and NT environment, our contact at the radio station was 
hoping for something with more power, higher stability and less cost. Being a public (i.e. not for 
profit) radio station, cost is an obvious issue. 

How: 

Internet Gateway 

Both an Internet services box (mail, proxy, DNS etc.) system, and an IPChains firewall were 
established for the Client. 

Hiccups: Not so much a hiccup, more of a weird configuration gotcha. Less than a week after we 
went live, the Client calls in with a problem. No-one can read their mail. After a cursory scan by 
one of our sys-admins (logged in remotely with ssh) we noticed that the /tmp directory was marked 
as read-only. This meant that the IMAP/POP daemon could not create the temporary data files that 
it needed in order to serve incoming mail-client connections. A simple chmod, and problem solved. 


Result: The outcome of all this? The contact at the Client is happy with the Linux solution, so 
much so that they are considering using a Linux server to run the streamed MP3 radio station they 
are planning, with open source software such as Icecast (www.icecast.org) 


File, Mail and Internet Server for MB Sales, Reservoir 

And something to demonstrate the small/medium-business side of the systems market for Linux and 
open source. 
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Who: MB Sales is a distributor of plumber’s supplies throughout Australia. They operate in office 
in both Melbourne and Sydney, and have under 20 staff. 

What: MB Sales was in the market for a file server to run their recently chosen high-end 
accounting software. With upto a thousand invoices per week, this business could ill-afford 
downtime on their file-server. Their accounting solution provider offered them only NT and 
Netware servers, so MB Sales came to us for other options. We suggested a Linux-based solution. 

Why: With the combination of Linux and Samba, we would be able to offer them the same 
file/print serving features of NT or Netware, at minimal software costs. More importantly, for a 
small/medium business like MB Sales, we were able to also provide web, email and proxying 
services with Linux, for no additional licence costs. 

How: As it turned out, convincing MB Sales to consider an open source solution was straight 
forward. Convincing their accounting software supplier was another matter. They kept throwing 
down more and more complicated what-if scenarios that the Linux solution had to hurdle before 
they would give the green light to deploying their system on Linux. Eventually, with MB Sales’ 
insistence, they agreed. 

File Serving 

A standard Linux/Samba file server was established, and, as there had been no equivalent system in 
place prior to this, there were no issues about migrating user /home data, mail and passwords. 

Internet Gateway 

A dial-on demand system was established (with much hassle) to allow all the Client’s staff to 
browse the web from their Windows workstations. Email was established using the standard 
IMAP/POP and Sendmail servers shipped with the distro. The Eudora mail-client was chosen for 
the Windows workstations, as it is very simple and free. 


Hiccups: After a few months of solid use, the Client started reporting problems with the server. 
These were of a lock-up/system freeze nature. After remotely scanning the system, and finding no 
obvious software problems, we sent out one of our hardware guys to inspect the server. As it turns 
out, the CPU fan had stopped working, causing nasty CPU overheat problems. After replacing both 
CPU and fan, the server now shows uptimes of a couple of hundred days at a time. 

Result: After having to spend a bit of time convincing the Client on the advantages of Linux and 
open source, it is heartening to see that their main IT person now runs Linux at home, and has 
actually started sending pro-Linux advocacy emails in our direction! ;-) 


About the Author: Con Zymaris is the CEO of Cybersource Pty. Ltd. a long-standing IT & Internet 
Professional Services company. Con has been using and programming computers since 1979, and 
using the Internet since 1989 and is an enthusiastic advocate for open-source software libre. While 
computers were always a passion which morphed into a career, at the University of Melbourne he 
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Abstract 

In recent years, Linux has enjoyed growing popularity in the server market, and more recently in the 
desktop market. There is now growing interest in the technical community for Linux on handheld 
devices. This paper reviews this emerging popularity and discusses Compaq’s experiences in porting 
Linux to the StrongARM-based iPAQ H3600. 
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Introduction 


Linux server sales are growing rapidly 1 , and more recently Linux has begun to make some 2 inroads 
into the desktop market. Though not yet a player in the consumer market, there has been increasing 
interest from the technical community over the last few years in Open Source operating systems, 
such as Linux, for handheld and wearable devices. 

The Itsy Pocket Computer [1] developed by Compaq Research was one of the first handheld 
devices to run Linux. The LinuxCE website [2] lists a number of other projects under way to port 
Linux to a variety of handheld computers that were originally designed to run Microsoft’s Windows 
CE. You can even run Linux on your Palm Pilot if you want 3 . Linux is also the operating system of 
choice for MIT’s Ember project [3], a business-card-sized processor board intended as a general 
processing workhorse for a variety of projects in and around the Media Lab. 

And Linux is not the only Open Source operating system moving onto smaller devices: There is at 
least one project under way to port NetBSD to a handheld device 4 . 

Along with the Open Source operating system projects under way, there are a number of projects 
developing windowing systems and desktop environments for handheld devices. For example, the 
Microwindows Project [ 4 ], 


Footnotes 

1. A recent IDC report forecasts revenue in the U.S. Linux server market to compound 
annually at a rate of 30%, bringing shipments up to around 9% of the total entry-server 
market by 2003. [51 

2. Although perhaps not indicative of worldwide trends, a recent DataQuest report 
predicts the market share of Linux will not exceed 5% of workstations shipped in 
Europe by 2003 [6]. 

3. The Linux port to Palm Pilot was initially done by uCLinux [7], and is now being 
evangelised by Craig Comstock [8], but from what we can gather this is not in 
widespread use. 

4. NetBSD/hpcmips [9] brings the NetBSD operating system to MIPS based Windows CE 
PDA machines such as the Compaq Aero 2100, the IBM WorkPad z50, and the Philips 
Nino 500. 


The Open Handheld Program 

The project discussed in this paper is part of the recently announced Open Handheld Program [ 10 ]. 
The goals of this program are: 

• To provide a focal point for the discussion and development of innovative software and user 
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interfaces for handheld and wearable computing devices 
• To ensure the results of Compaq’s ongoing research into pocket computing (for example, our 
research into power consumption [11]) are available to the community 

The program hosts the handhelds.org website for use by projects in the open handheld space, no 
matter what the operating system or hardware. 

To get the ball rolling, Compaq has been working on a project to get Linux running on Compaq’s 
latest handheld computer, the StrongARM-based iPAQ H3600 [ 12 ], announced in May 2000. This 
project has made extensive use of Compaq’s learnings from our earlier work on Itsy [1] and The 
Compaq Personal Server [ 13 ]. For the past three months the authors have been part of the team 
porting the Linux operating system and core device drivers to the iPAQ H3600. The project is a 
collaborative effort between Compaq’s Palo Alto, CA and Cambridge, MA research groups and 
Compaq’s advanced development groups based in Palo Alto, CA and here on the Gold Coast, 
Australia. 

Structure of this paper 

We begin this paper with an overview of the hardware architecture of the iPAQ H3600 and an 
overview of the Open Source software being developed for the device. Next we detail our 
experiences writing device drivers for this device and describe some of the problems we 
encountered during the porting process. We conclude with a discussion of future Open Source work 
planned for the iPAQ H3600. 


iPAQ H3600 hardware architecture 

The iPAQ H3600 is a palm-sized computer weighing in at just under 180g 
and measuring only 130 x 83 x 16mm; roughly the same size and weight 
as a Sony Walkman. 

This 206MHz StrongARM 1110-based pocket-Hercules comes standard 
with 16MB of Flash ROM and 32MB of RAM. It also has a 240 x 320 
colour TFT LCD display, touch screen input, and a fully-specified 
expansion bus that allows others to create companion peripherals. 

This section of the paper gives a brief description of the major hardware 
components of the iPAQ H3600. 

Processor 

The StrongARM 1110 32-bit RISC processor can be clocked from 59MHz - 221.2MHz. Intel 
reports performance of 235 Dhrystone MIPS at 206MHz [ 14 ], 

Memory 

The iPAQ H3600 has 32 MB SDRAM and 16 MB ROM (Flash) on board. Memory can be 
expanded using a 32MB or 64MB CompactFlash memory card in conjunction with the 
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CompactFlash Card Expansion Pack. 

Screen 

The touch-sensitive TFT LCD display has a resolution of 240 x 320 pixel and 4096 colours. Its 
reflective technology combined with a front light and ambient light sensor provide outstanding 
clarity in a wide variety of light conditions. The display is driven directly by the StrongARM [ 14 ], 
and the touch screen is connected to the CPU through a custom microcontroller (which is also used 
to interface the buttons and LEDs). 

Networking 

In addition to the RS232 and USB serial interfaces provided by placing the iPAQ F13600 on a 
cradle, a 56Kbits/sec CompactFlash Modem Card is available for use in conjunction with the 
CompactFlash Card Expansion Pack. The PC Card Expansion Pack also makes available a wide 
variety of other networking hardware, including wireless connectivity. There is also an infrared port 
that can be used for data exchange with another suitably equipped computer. 

Multimedia 

Full stereo sound capability is provided through a Philips 1341 codec and mini-stereo jack for 
connection to external speakers or headphones. On-board, there is also a microphone and speaker 
for mono sound input and output. 

Expansion 



The iPAQ H3600 has been designed to allow an expansion ’jacket’ to slide 
over it and connect with the on-board expansion bus. Initially, Compaq is 
shipping a CompactFlash Card Expansion Pack and a PC Card Expansion 
Pack, with plans for various wireless jackets for the U.S. and Europe to be 
available some time early next year [ 15 ]. 




Compaq will release the hardware 
specification of the iPAQ H3600 expansion 
bus to allow hardware and software 
developers the freedom to design their own 
interfaces [ 12 ], The possibilities for this 
feature are limited only by the imagination. 


Power Options 

The iPAQ H3600 can be powered in three ways: 


• Standalone with its own internal battery. 

• On a cradle with a separate power supply. Two types of cradle are available: one with an 
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RS232 cable and one with a USB cable. 

• From an auxiliary battery in an expansion pack jacket. 


Open Source Software for iPAQ H3600 

This section describes the software components currently available for the iPAQ H3600. 

Operating System 

The Linux support for the iPAQ H3600 is available on handhelds.org as a set of patches to the 2.4 
kernel. 

The iPAQ H3600 is factory fitted with the Windows CE operating system in Flash. Part of our 
project was to write a simple Windows CE application to allow the user to reflash the device with 
Linux. This application also lets the user save the Windows CE image for later reflashing if 
necessary. 

Networking 

We currently have the Linux TCP/IP stack running on the iPAQ H3600 with support for the 
following network services: 

• PPP over RS232 

• NFS over USB (coming shortly) 

Currently there is no Ethernet hardware or drivers. We anticipate it will be fairly easy to get 
existing drivers for all kinds of PC Card devices (including Ethernet) running, once we have 
implemented the basic kernel support for the PC Card Expansion Pack. 

Graphics 

With the assistance of Keith Packard, we have a port of the XFree86 [ 16 ] windowing environment 
running under Linux on the iPAQ H3600. As well as viewing your applications directly on the 
handheld screen, X also gives you the flexibility to view handheld applications on your desktop 
screen. 

At the time of writing this paper, the Open Handheld team is still evaluating a number of window 
managers for use on the iPAQ H3600 and other handheld devices. We hope to have more 
information on window managers by the AUUG conference in June 2000. 

User input 

The touch screen interface allows users to manoeuvre their way around the screen using the touch 
of a pen. 

For alphanumeric input the user has a number of options: 

• Handwriting recognition software, such as scribble. The Open Handheld team is working 
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towards making available the same handwriting software that was previously demonstrated on 
Itsy. 

• A virtual keyboard to provide QWERTY-style input. The Open Handheld team is also 
working towards making this part of the Itsy demonstration application suite available. 

• Although of little use to application users, TCP/IP over PPP will allow developers to connect 
the iPAQ H3600 to another machine hosting the TCP/IP/PPP protocol and log on to the iPAQ 
H3600 from the other machine. This mode of operation will mostly be used by application 
developers and when installing software or performing other system maintenance. 

File System 

We store system applications and commands (for example, the contents of the /bin, /usr/bin, and 
/usr/sbin directories) on a read-only, compressed file system called cramfs. This filesystem is 
stored in Flash. All temporary (modifiable) files are stored on a battery-backed RAM disk, 
formatted with the ext2 file system. 

The mount command on the iPAQ H3600 prototype looks like this: 

# mount 

none on / type shm (rw) 

none on pipe: type pipefs (rw) 

none on / type proc (rw) 

/dev/root on / type cramfs (rw) 
none on /proc type proc (rw) 

/dev/rami on /tmp type ext2 (rw) 

/dev/ram2 on /var type ext2 (rw) 

/proc on /proc type proc (rw) 
none on /dev/pts type devpts (rw) 

/dev/flash4 on /usr type cramfs (rw) 


Our experiences writing Linux device drivers for the iPAQ 
H3600 

Before writing a device driver for the Linux platform you may wish to purchase a good reference 
book. We highly recommend Linux Device Drivers by Rubini [ 17 ]. It is, by far, the best material on 
writing device drivers available at the time of writing. If you want to write (or understand) device 
drivers this book is a must. Another good book is Linux Kernel Internals by Beck et al [ 18 ], which 
deals mainly with the kernel and complements the information found in Rubini. 

Since we are dealing with a full-featured Linux operating system, writing device drivers for the 
iPAQ H3600 is nearly identical to writing device drivers on any other Linux operating systems. The 
main differences are outlined below. 

Compiling & Linking 

All device driver source code needs to be compiled and linked. You obviously cannot do this using 
your friendly gcc compiler on your PC but need a StrongARM cross compiler and linker. The 
resulting object file is then: 

1. Copied from the development machine (your PC) to the target machine (iPAQ H3600) 
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2. The driver is then loaded using the insmod command on the iPAQ H3600. 

Future enhancements will allow native development, which means your target and development 
machines will both be the iPAQ H3600. The ability to perform native development is just one of 
the benefits of using such a powerful processor as the StrongARM 1100. 

Once you have your driver loaded you need to debug it. We found printk the most useful tool for 
driver debugging. However, GUI tools that allow remote kernel debugging, such as Code Medic 
[ 19 ], are gaining popularity. We intend to investigate the usefulness of this tool once we have gdb 
running on the iPAQ H3600. 

Hardware Control 

Device drivers need to do some low-level setup like enabling interrupts and setting up the 

SA1110’s onboard UARTs. The experiences with the iPAQ H3600 involved all of the above. You 

also need to be familiar with the StrongARM architecture and the memory map [ 20 ],[ 21 ]. 

Debugging 

Before you can debug using printk, you need a terminal on which to display debug messages and 
issue shell commands. During development of the touch screen driver we used a terminal emulator 
that supported the Zmodem protocol (for example, minicom, which ships with most Linux 
distributions). We connected the terminal emulator on our Linux development PC to the RS232 port 
on the iPAQ H3600. This is the procedure we used to download and install our driver code on the 
iPAQ H3600: 

1. Establish a session with the iPAQ H3600 using the terminal emulator - you should be able to 
log in as root and execute all the familiar commands such as is, cp, rm. 

2. Cross-compile the driver source on your development machine using the cross compiler and 
linker. 

3. On the iPAQ H3600, prepare for Zmodem transfer by issuing the rz command. 

4. Use the terminal emulator’s in-built Zmodem protocol to send the file to the target machine. 

5. Install the driver using the insmod command. 

6. Verify that the driver is loaded by issuing the cat /proc/modules command. 

Subsequent messages output by the driver’s printk statements are now output on the terminal 
emulator via the iPAQ H3600 serial port. 

One advantage of debugging on an iPAQ H3600 is that when it crashes, the subsequent reboot is 
fast and painless as there are no time-consuming fsck to worry about. 


Interesting problems encountered installing Linux under 
Windows CE 

As with any software development project, we encountered a number of tricky bugs to solve along 
the way. This section discusses the problems we had in establishing a safe way to install Linux over 
Windows CE on the iPAQ H3600. 
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Since we initially had only one development board, it was important to get the process right before 
modifying the contents of Flash. If we overwrote the Windows CE image with a bootloader that 
didn’t work, there would be no way to load another image using software. This would not have 
been such an issue, had we been able to get JTAG [22] working reliably enough to read and write 
Flash memory. However, it would later have become an issue when we repeated the process on 
form-factor devices, because these have no JTAG connector and there was the possibility of 
hardware changes that would have required code changes to work correctly. 

We needed a way to run our initial software on these devices in a non-destructive manner. The 
obvious choice was to run any test application in RAM in such a way that it was not affected by the 
state in which Windows CE had left the device prior to executing the application. 

Running the bootloader from RAM 

Prior to beginning this work, our colleagues in the Cambridge lab had the bootloader code running 
from Flash on an identical development board. (They were able to reliably write Flash via JTAG on 
their device.) So we knew the bootloader code worked with the hardware. 

We wrote a Windows CE user application to disable interrupts, copy the bootloader code to a 
selected physical address, flush Icache, flush Dcache, disable the MMU, and jump to the start of the 
bootloader code. Our initial implementation did not work... 

Bootloader problem 1 

The first problem with the implementation was that we were using physical memory that could be 
in use by the (Windows CE) operating system. We investigated two ways to solve this problem: 

• The first was to obtain a virtual memory block from Windows CE 1 , copy the bootloader code 
there, then calculate the corresponding physical address, and finally jump to the physical 
address after disabling the MMU. The problem with this approach was that we were unsure 
whether Windows CE allocates pages contiguously, so the bootloader code might be 
fragmented in physical memory and therefore not execute correctly. 

• The second was to find a block of memory that is not used by the Windows CE operating 
system and copy the bootloader there. 

We chose the second approach, and found a block that Windows CE uses for a driver. We decided 
this memory was safe to use, since the driver no longer executes after the bootloader starts. The 
approach worked, and we could then get part way through the bootloader execution. 

Bootloader problem 2 

The second problem with the implementation was that execution of the bootloader code would hang 
some time before displaying the main menu. Although the bootloader always runs at physical 
address oxoooooooo when running from Flash, it had also been tested running in RAM at various 
other addresses on the Cambridge development board, so we believed the code was relocatable. 

By instrumenting the bootloader code with debug output sent to the serial port, we determined that 
its assembly code was being executed but not its C code (the assembly code does some initialisation 
before jumping to the C code). After jumping to the start of the C code, no more output was seen on 
the serial port. We spent a lot of time investigating the likely causes of the problem: 
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• A spurious interrupt going off 

• A trap being generated that would end up in Windows CE code when Windows CE was no 
longer running 

• Code corruption occurring while copying the bootloader to RAM 

The problem turned out to be something else. Although the bootloader code was meant to be 
relocatable, the binary image showed assembly instructions that accessed memory relative to the 
beginning of Flash (address 0x00000000). Because we had not changed the contents of Flash, this 
meant that the bootloader was referencing memory locations that were still set up for Windows CE. 
This also explained why our colleagues in Cambridge had been able to run the bootloader at other 
addresses in RAM: their board had identical copies of the bootloader installed in Flash and in 
RAM. 

The short-term fix for this problem was to set the linker flags so that the text and data were linked 
for the RAM address where the bootloader was being loaded. 

Booting the Linux kernel from RAM 

At this point the bootloader was running from RAM and we were able to download kernel images 
and begin execution. The kernel image would successfully decompress and complete kernel 
initialisation as far as the trap_init call in start_kernel (see init/main.c). The trap_init 
call is the last architecture specific initialisation in the boot process; it sets up the StrongARM 
exception vector table with the handlers for these exceptions: reset, undefined instruction, software 
interrupt, prefetch abort, data abort, IRQ and FIQ. 

By instrumenting the code with debug output sent to the serial port, we knew that execution did not 
proceed past the stmia instruction that writes the exception vector table. At this point, virtual 
address 0x00000000 (the start address of the StrongARM exception vector table) was mapped to 
physical address 0xc0i20000. We suspected that the stmia instruction was failing due to a data 
abort (perhaps because the processor was in a non-privileged mode that did not allow write access 
to that page of memory). We were unable to verify this because we had no way to install a debug 
exception handler that would output a character to the serial port when called 2 . 

For now, we have bypassed this problem by storing the bootloader image in Flash. This allows us to 
execute the bootloader from a cold boot, guaranteeing that the processor and devices are in a known 
state (as opposed to whatever state they may be in when executing the bootloader from RAM in the 
Windows CE environment). We intend to revisit the RAM boot problem and may have a solution 
before this paper is presented. 

Booting the kernel from Flash 

We encountered one final problem before we could successfully log in to the iPAQ H3600. After 
using the bootloader running in RAM to write the bootloader, the kernel, and the cramfs filesystem 
to Flash, the kernel booted and we got the login prompt. However, after logging in and displaying 
the message of the day, the system would immediately log us back out. 

The problem turned out to be a corrupt cramfs filesystem. The bootloader code that was used to 
write to Flash was accidentally asserting a PCMCIA output enable, which meant that the lower 
16-bits on the data bus were occasionally driven low. When this happened, the data we were 
writing to Flash had the lowerl6-bits clobbered. 
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Once the bootloader was fixed, the filesystem could be written to Flash correctly and we could log 
in. 


Footnotes 

1. Windows CE does not provide any call to request a specific block of physical memory. 

2. We later found that writing to physical address 0xc012000 via a different virtual 

address worked, so we will use this method to continue our investigations. _ 


Future work 

This section discusses future Open Source work planned for the iPAQ H3600 hardware platform 
and how interested parties can join the Open Handheld Program. 

Next steps for iPAQ H3600 development 

• Complete the core driver development work, including USB, infrared, and audio. Once USB 
is working, we’ll be able to get NFS working over this, which will make development work 
easier. 

• Get native compilation working; coupled with NFS this will allow developers to compile on 
the iPAQ H3600 and relieve them of the intricacies of setting up the cross-development 
environment. 

• Add kernel support for the PC Card Expansion Pack and the CompactFlash Expansion Pack. 

• Streamline the process for average users to install Linux on iPAQ H3600 and allow them to 
recover Windows CE if they later change their mind. 

• Integrate the power management code developed for Itsy [11] into the iPAQ H3600 kernel 
and drivers. These improvements involve lowering the CPU clock speed while in the kernel 
idle loop and an infrastructure for shutting down devices into low power mode when 
requested by the user pressing the power button. The drivers need callbacks written for the 
standard power management hooks added into the kernel. Fortunately the hardware design of 
the iPAQ H3600 lends itself to good power management, with each subsystem having the 
capability to be powered down (as do most handhelds). 

How to Join the Open Handheld Program 

We are keen to get projects signed up and participating in the community at handhelds.org; projects 
relating to other hardware and/or other operating systems are most welcome. We encourage you to 
visit the website and sign up to our mailing lists. 
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Goal 

Using typical hardware and the Linux OS, we would like to create an environment where data stored on 
such a system cannot be modified by someone gaining root. 

1. Introduction 

Features of Linux’s second extended filesystem (ext2) lend themselves towards mimicking a system run¬ 
ning from a CDROM. This allows organizations to provide read-only software and documentation over a 
network without burning CDs. 

"Pass-through" systems which allow clients to interact with the familiar (a web server) while providing data 
(indirectly) to something unfamiliar (a mainframe, for instance) could also run from such a read-only envi¬ 
ronment. 

Additionally, such a system could be "opened up" slightly, to behave like a CDR. Because CDR technol¬ 
ogy requires that substantial headers and trailers be written even for small amounts of data, only a limited 
number of write operations can occur. A Linux virtual WORM, on the other hand, can handle as many 
writes as desired, and is limited only by network bandwidth and disk capacity. Such a device can be used 
to store: 

• log information (syslog, TCP wrapper data, sulog, etc.), 

• digest (hash) information from other systems, 

• home-grown "logs" which could be processed to produce other data structures. 

This paper details how to install and configure Slackware Linux to create a virtual CDROM or WORM 
drive. Other variants of Linux should also be able to work in this way. 

In cases where systems contain unchanging data, or where systems hold no data permanently, it may be 
possible to configure the system in a way as to make the loss of root irrelevant. 

2. How it works 

The ext2 filesystem allows a number of additional attributes to be set for files, beyond the normal i-node 
information such as permissions, uid/gid, etc. The additional attributes we’re interested in are as follows: 

• i - immutable. Once a file’s immutable attribute has been set, the file cannot be moved, deleted or 
changed. 

• a - append only. The i-node may only be appended to, not rewritten from the start. 

• S - synchronous writes. Data written to this i-node is immediately written to disk. 

The key to setting up a virtual CDROM or virtual WORM lies in two utilities provided with the Linux 


1 Copyright 2000, by the Rector and Board of Visitors of the University of Virginia. Permission granted to reproduce so long as 
author details and affiliation are preserved. 

2 Note This is a work in progress--the latest version of this document can be found at: 

http://www.itc.virginia.edur drs8h/linuxvcdromworm/ 

DISCLAIMER: No warranty or merchantability is claimed or implied in this paper. Use at your own risk. 
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Second Extended Filesystem (ext2): lsattr (list attributes) and chattr (change attributes). 

CDROM-like operation can be achieved by setting the entire system to have the immutable attribute set, 
with only a limited number of i-nodes (such as /dev/null) not having it applied. 

WORM-like operation can be achieved by setting most of the system i-nodes to have the immutable 
attribute turned on, except for a select set of i-nodes, which will have the a and S attributes on instead. S is 
not strictly required, but if the system cannot be shutdown (as discussed below), it will be needed to mini¬ 
mize data loss. 

For both environments, some daemons will need to be configured and possibly patched to work. (For 
starters, anything that wants to create an empty file for locking purposes, like Apache.) 

Note that a mostly-CDROM system can (and might need to) utilize WORM-like operation for selected files 
(if you want an http error_log, for instance). 

At the end of the installation, the CDROM/WORM system will need to have it’s lsattr & chattr commands 
removed (otherwise it’s a simple matter for someone gaining root to chattr the system back to it’s normal 
state). This requires you to attach the disk ultimately running the CDROM/WORM to a normal Linux sys¬ 
tem for tailoring. 

3. Installation Overview 

It’s helpful to have a system with at least two hard disks, and where both can be easily attached during 
selected parts of the installation. At the end of the process you will have only one disk attached. (Or, 
rather, you will have the CDROM/WORM system running with a collection of filesystems, and no other 
code or unused space will be available.) 

The basic steps are: 

L Install a normal, full-blown Linux system on one of the disks (hereafter the "setup disk") and com¬ 
pile a new, minimal, static kernel. 

2. Compile any applications you might be using on the final CDROM/WORM disk (hereafter the 
"cdrom disk"). 

3. Switch the disks and install a minimal Linux system onto the the cdrom disk. 

4. Reboot the setup disk and mount the cdrom disk under /mnt. 

5. Copy the applications to the cdrom disk, configure as required, and issue appropriate chattr com¬ 
mands for what you need (CDROM or WORM). 

6. Reboot with just the cdrom disk. 

4. Setup System Installation/New Kernel Compilation 

Install the setup system on it’s own disk and create a new kernel with the absolute minimum number of fea¬ 
tures your cdrom disk requires. 

The appendix contains the resulting .config file from one of my kernel "make config" runs. 

(On my system, I disabled support for the /proc filesystem. More on that below.) 

You should write the resulting kernel to a diskette and rdev it if necessary (although both systems should 
reside on separate disks and should be the only filesystem on their respective disks). 

If you end up booting from this diskette for your cdrom system, DON’T FORGET TO WRITE-PROTECT 
THE DISKETTE ... 
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5. Compile your applications 

For a CDROM system, the Apache server is the natural choice for serving requests. 

Unfortunately, Apache needs some small changes before it will run in this environment. While the PidFile, 
ErrorLog and CommonLog can all be directed to /dev/null or in some cases to a "Sa" (append-only) file), 
the LockFile code wants to create a normal file from scratch. 

The appendix details how to change Apache_1.3.12 to get around this problem. 

For a WORM system, you will need a daemon to read or append data to one or more worm files. A sample 
daemon is provided in the appendix. Note that this is a perl application which does not currently compile 
(or run properly) under perl 5.6, so you will need a perl interpreter installed on your cdrom disk to run it. 
(YouTl probably rewrite it anyway to suit your own needs.) 

6. Install a Minimal Linux System onto the CDROM Disk 

For Slackware 7.0, this involved selecting these disk sets and the selecting only the following components 
from them: 

A 

ide or scsi (not both) 

aaa_base 

bash 

bin 

cxxlibs 

devs 

e2fsprog (optional--see below) 

elvis 

etc 

fileutils 

glibcso 

grep 

gzip 

hdsetup 

ldso 

less 

do not install "modules" 
do not install "modutils" 
procps (optional--see below) 

sh_utils 
shadow 
sysvinit 
tar 

txtutils 

zoneinfo (optional) 


D 

perl (only if running the perl Iwormd application) 


N 

tcpipl 

tcpip2 

Define one e2fs partition, using all available space on the disk. Don’t define a swap partition-the swapoff, 
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fdisk and mke2fs commands could allow someone with root to create writable disk space. (My understand¬ 
ing of Linux swap partitions was that only the 1st 16MBs were used-anything past that in the swap parti¬ 
tion could possibly be used to store data using raw I/O to the device.) 

If you choose to not install procps, ps will not work and the shutdown process will not be able to find pro¬ 
cess IDs to kill them. But this is irrelevant for all of our applications except the WORM daemon. (It defi- 
nately makes it harder for someone who’s broken in to find out what’s runing in your system.) An alternate 
approach is to compile /proc support into the kernel and to install the procps package, but then to comment 
out the /proc entry in /etc/fstab-ps will then work and shutdowns are normal, but /proc cannot be mounted, 
and there’s no easy access to application memory. 

You might need some of the slackware packages omitted to run certain applications (like syslog). Also note 
that the kernel modules and modutils packages are not needed if you run a static kernel. 

Set the root password. 

If you did not install e2fsprog, replace line 35 of /etc/rc.d/rc.S (fsck) with "echo ’fsck not available’". 

If you DID install e2fsprog, make sure you remove the lsattr and chattr commands. 

Edit /etc/HOSTNAME, /etc/resolv.conf, etc/hosts, and /etc/rc.d/rc.inetl appropriately. 

Edit /etc/rc.d/rc.M to not invoke rc.inet2 (unless you need these daemons). You may also want to comment 
out a number of the things started towards the end of rc.M. 


7. Configure the CDROM Disk Using the Setup Disk 

Reboot the setup disk and mount the cdrom disk under /mnt. 

Remove the lsattr and chattr commands (in /usr/bin) from the cdrom disk if you have installed the e2fsprog 
package (above). 

Copy the applications to the cdrom disk and configure them as required. 

For Apache, you will need to set your httpd.conf directives to be something like this: 

ErrorLog /var/log/error_log or /dev/null 

CommonLog /dev/null common 

PidFile /dev/null 


LockFile /var/run/httpd.lock 

You will need to "touch" any files not specified as /dev/null. (Normally the httpd.lock file only exists 
briefly but this isn’t possible in our case.) 

Then set the cdrom disk to be immutable: 


cd /mnt 
chattr =i . 
chattr -R =i * 


Switch /dev/null back to normal: 

chattr -i /dev/null 


Then adjust a few selected i-nodes for Apache, if you’re using it: 

chattr =Sa /var/log/error_log 
chattr =Sa /var/run/httpd.lock 
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Reboot with just the cdrom disk. You should have a running virtual CDROM system. You’ll probably have 
to edit the /etc/rc.d scripts to have the system boot without any complaints. 

Note: another approach you can consider would be to remove the mount command from the system. The 
kernel will mount the disk read-only when the system boots. You’ll have to change the startup scripts and 
won’t be able to run Apache, but this might be an easier approach. 

8. WORM Issues 

A virtual WORM can be created by using an append-only index file, which contains number:name pairs, 
and a set of pre-existing, append-only files name "1”, "2", "3", etc. When a request is made to "create" a 
new file, the daemon selects the next number and stores the association in index file. 

A simple, single-threaded, TCP Iwormd server is provided in the appendix. The setup involved for this 
daemon is detailed in the comments at the start of the code. A simple protocol for clients is also detailed. 

One problem with the append-only files: even though they’re set with the execute bits off, someone could 
write a script into one of the empty ones and then invoke it using a command like "bash 125". One possible 
way to prevent this abuse might be to pre-load this sort of data at the start: 

exit 
exit 
exit ; 
exit; 

and then have the daemon ignore the first 4 lines. Another approach might be to patch the interpreters 
remaining on the system to ignore all but a select few files. 

9. Denial of Service Attacks 

DOS attacks are still possible against this system. If an intruder gains root, the simplest attack would be to 
shut the system down. 

inittab is another problem—the system should have only one state. Otherwise telinit could be used to put 
the system into a different state. 

To prevent this, you might want to consider removing the following programs: 
kill 

umount (if removed, you need the "S" attribute on for all append-only files) 
shutdown & halt 

ifconfig (needs to be rewritten to allow only "up") 
route (needs to be rewritten) 

telinit (if possible-might need to rewrite to only allow state 3). 

I’m sure this is not the final list... 

Obviously, there’s no special defense available to this type of system for attacks at the network interface. 

10. One (Big) Remaining Problem 

Unfortunately, kernel and daemon memory can be rewritten. This allows anyone gaining root to overwrite 
memory and build their own functions on the fly. 

To minimize the possibility of this, the following steps could be taken: 

• Substitute bash with a stripped-down shell which will only run select commands from /etc/rc.d (ie, 
allow the boot process to proceed. 

— This might create other problems for daemons which expect normal shell-like behavior. 
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• Remove all other compilers and interpreters from the system, and compile interpreted programs, or 
rewrite them in C. 

• Run without a /proc filesystem, to make it harder to modify kernel and/or application memory. 

• Remove all unused programs from the system. 

There are other measures which could be taken. Several recent Phrack articles 
(http://alt.2600.com/phrack/), specifically issues 52 and 53, detail patches to the kernel which might be use¬ 
ful in these environments. No doubt additional patches will come to light over time. 

11. Final Thoughts 

Obviously, burning the above system onto a CD and booting from that protects you against the possibility 
that you (or I) might have overlooked something. 

Even if you boot using a real CD and have no writable storage, it still might be possible to hack the system 
using a buffer-overrun hole. Despite long-standing and wide-spread publicity of this problem, systems con¬ 
tinue to have these vulnerabilities. 

The real issue is: how can we protect the RAM-resident kernel (and applications) from the possibility that 
unknown holes in the system could result in a loss of control despite our best efforts? 

One solution might be to load the kernel and application code text segment into ROM (thus making them 
immutable). 

One day, perhaps our friends at Linux or FreeBSD will take this step, offering their kernels via a config¬ 
urable web page (running on read-only systems, of course!), with the user downloading the requested con¬ 
figuration and loading it (and their applications) into their systems in a fashion similar to the way flash 
BIOSes are upgraded now. 
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12. Appendix 

Changes needed to Apache_1.3.12 to run in a read-only environment 

Change src/http_main.c as follows: 

— Comment out lines 497 & 498 (from M ap_lock_fname =" to "getpid());"). This suppresses the 
appending of the server’s PID onto the end of the lock file name. 

— Change line 807 bitwise-or constants from 

0_CREAT | 0_WR0NLY | 0_EXCL 
to 

0_WR0NLY | 0_APPEND 

(ie, drop 0_CREAT & 0_EXCL and add 0_APPEND). This enables a pre-existing, chattr =Sa file to 
serve as the lock file. 

Note that this change may prevent you from running multiple apache servers with all of them using 
the same httpd.conf file (and hence the same LockFile directive). 

— Then configure, make, etc. 

— Decide what logs you want, and chattr then =Sa. The httpd.lock file *must* be created (where speci¬ 
fied in httpd.conf) and must be chattr =Sa as well. Note that this file does not have the .PID suffix. 

You should also chmod the files 0644. 

— Finally, I don’t think it hurts to write something like this at the start of the apache append-only files: 

exit 
exit 
exit; 
exit; 

This should make these files worthless to someone trying to append a perl or bash script into them. 
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13. The Linux WORM Daemon (lwornid) 

#!/usr/bin/perl -w 

# lwormd.p -- A Linux WORM Daemon 

# Parts of this code were stolen from the perl manpages. 

# CHANGE THESE VARIABLES and then do the rest of the installation; 

# then read the Usage instructions 

$serverpasswd = 'lwormd'; # pick a password for commands sent to daemon 

$PORT =9000; # pick a port for your daemon to run on 

$maxwormindex =9; # last pre-defined worm file (see below) 


######################################################################## 

# INSTALLATION 

# 

# install this into /usr/local/lwormd 

# 

# mkdir worms ; cd worms create the worms subdir 

# 

# touch wormfileindex (this file 

# keeps track of name:number pairings 

# and what WORM files have been assigned 

# 

# touch wormindexlock (this file is used to lock updates 

# to the index file) 

# 

# touch stdout stderr (these allow for the detached daemon to log events) 

# 

# touch 1234 ... (how ever many you want) (should write a script for this... 

# (note: largest number should = $maxwormindex above) 

# (make sure these are execute off) 

# 

# enter one-line password into /usr/local/lwormd/worms/serverpasswd 

# 

# chmod 0600 * (set to read/write by root only) 

# 

# chattr =Sa * (set to append only, synchronous writes) 

# 

# chattr =i . (set current dir to immutable, to stop new file creat) 

# 

# cd . . 

# 

# chmod 0600 wormfileindex (set to read/write by root only) 

# 

# chattr =Sa wormfileindex (set to append only, synch writes) 

# 

# chattr =i . (set current dir to immutable, to stop new file creat) 

# 

# Finally, update /etc/rc.d/rc.local to start /usr/local/lwormd/lwormd.p. 

# 
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######################################################################## 

# 

# Client Usage: 

# 

# connect to daemon port above and issue one of these commands 

# ("pass" is the password supplied above): 

# 

# pass:new:filename (create a new file) 

# 

# pass:append:filename (append to the end of a file) 

# appended data 

# appended data 

# appended data 

# 

# pass:read:filename (read the contents of a file) 

# 

# pass:shut (shutdown the daemon) 

# 

######################################################################## 


# chdir to worms dir 

chdir("/usr/local/lwormd/worms"); 


# detach from what started us 

use POSIX qw(setsid); 

close (STDIN); 

close (STDOUT); 

close (STDERR) ; 

$pid = fork(); 

exit(0) if ($pid); # parent leaves 
$r = POSIX: :setsid(); 
exit(-1) if ( $r == -1) ; 


# reopen stdout & stderr 
open(STDIN,"/dev/null"); 
open (STDOUT, "»stdout" ) ; 
open (STDERR, "»stderr") ; 


# write log record 

$lr = "Lwormd started at " . localtime() . "\n" ; 

print STDOUT "$lr"; 


# open wormfileindex and store data in a hash & array 
%filenameindex = ( ) ; 

@filenumberindex = ( ); 

open(WI,"wormfileindex"); 
while ($wirec = <WI>) { 

chop($wirec); 
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($indexnum,$filename) = split{ " : " ,$wirec, 2) ; 
$ filenameindex{$ fi1ename} = $indexnum; 
push(@filenumberindex,$filename); 

} 

close(WI); 


# open socket and wait 
use 10::Socket; 

{server = 10::Socket::INET->new( Proto => 'tcp', 

LocalPort => $P0RT / 

Listen => 5, 

Reuse => 1); 

die "can't setup server" unless {server; 

$sd = 0; 

while ($client = $server->accept()) { 

$client->autoflush(1); 

$req = <$client>; 

chop($req); $/); 

chop($req) if ($req =~ / 

if ($req = ~ /"$serverpasswd:/) { 

$req = ~ s/"$serverpasswd://; 
if ($req =~ /"new/) { &new; } 

elsif ($req = ~ /"read/) { &read; } 

elsif ($req = ~ /"append/){ &append; } 
elsif ($req = ~ /"dump/) { &dump; } 

elsif ($req =~ /"shut/ ) { $sd = 1; } 

else { print $client "Command unknown.\n" 

} 

close $client; 
last if ($sd); 

} 

close(STDIN); 
close(STDOUT); 
close(STDERR); 
exit ; 


sub new { 

# trim off the new: prefix, leaving the file name 
$req =~ s/new://; 

{newname = $req; 

# open the lock file (for some reason opening the index with 

# "+<" doesn't work when the lsattr is =Sa 
open(WL,">>wormindexlock"); 

$rc = flock(WL,2); 
if ($rc != 1) { 

print $client "l\n"; 
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close (WL) ; 
return; 

} 

# make sure this name does not already exist 

if (def ined ( $f ilenameindex{ $newname}) ) { 
print $client "2\n"; 
close(WL); 
return; 

} 


# get a new worm file 

$newnum = scalar(@filenumberindex) + 1; # get next number (remember, 

# worm files start from 1) 

if ($newnum > $maxwormindex) { 
print $client "3\n"; 
close(WL); 
return; 


# if file exists, use it 
if (-f "$newnum") { 

open(WI, "»wormf ileindex") ; 
print WI "$newnum:$newname\n"; 
close(WI); 

$ f i 1 ename index { $ newname} = $newnum; 
push (@f ilenumberindex, $newnum) ; 
print $client "0\n"; 

} else { # worm file does not exist 

print $client "4\n"; 

} 

# return 
close(WL); 
return; 


sub read { 

# trim off the read: prefix, leaving the file name 
$req =~ s/read://; 

$wormfilename = $req; 


# open the index of name/numbers, and get the corresponding number 

# note that the "number" is the actual file name in the filesystem 
$wormf llenumber = $f ilenameindex {$wormf il ename 


# return error if filenumber was not found 
if ($wormfilenumber == 0) { 

print $client "l\n" if (defined($client)) ; 
return; 

} 


# read to file and print to client 
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} 


open(WF,"$wormfilenumber"); 
while ($rec = <WF>) { 

print $client "$rec' if 

} 


(defined($client)); 


# close the fi 
close(WF); 
#print $client 


le and return 

"0\n" if (defined($client)); 


sub append { 

# trim off the append: prefix, 
$req =~ s/append://; 

$wormfilename = $req; 


leaving the file name 


# open 

# note 
$wormf 


the index of name/numbers, and get the corresponding number 
that the "number" is the actual file name in the filesystem 

lenumber = $filenameindex{$wormfilename); 


# return error if filenumber was not found 
if ($wormfilenumber == 0) { print $client l\n 


return; } 


# open the file for appending and get an exclusive lock 
open(WF,">>$wormfilenumber"); 

$rc = flock(WF / 2); 

if ($rc ! = 1) {print $client "2\n"; return, } 


# turn off buffering for the file 
select(WF); 

$1 = i; 

# read the rest of the network data and write it to the file 

while ((defined($client)) && ($req = <$client>)) ( 

last if ($req eq "end of worm data\n"); 
print WF "$req"; 

} 


# close the file and return 
close(WF); 

print $client "0\n” if (defined($client) ) ; 
return; 


) 

# this function is supposed to allow for dumping the contents 

# of all worm files across a network 

sub $client „ Dump not implemented. \n" if (defined($client)); 

return; 


} 
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Resulting .config file from ’’make config” command: 

# 

# Automatically generated make config: don't edit 

# 

# Code maturity level options 

# 

# CONFIG_EXPERIMENTAL is not set 

# 

# Processor type and features 

# 

CONFIG_M386=y 

# C0NFIG__M486 is not set 

# CONFIG_M586 is not set 

# CONFIG_M586TSC is not set 

# CONFIG_M686 is not set 
CONFIG__lGB=y 

# C0NFIG_2GB is not set 

# C ONF I G_MAT H_ EMUL AT I ON is not set 

# CONFIG_MTRR is not set 

# CONFIG_SMP is not set 

# 

# Loadable module support 

# 

# CONFIG_MODULES is not set 

# 

# General setup 

# 

C ONFIG_NET=y 
CONFIG_PCI=y 

# CONFIG_PCI_GOBIOS is not set 

# CONFIG_PCI_GODIRECT is not set 
CONFIG_PCI_GOANY=y 
CONFIG_PCI_BIOS=y 
CONFIG_PCI_DIRECT=y 

CONFIG_PCI_QUIRKS=y 
CONFIG_PCI_OLD_PROC=y 

# CONFIG_MCA is not set 

# CONFIG_VISWS is not set 
CONFIG_SYSVIPC=y 

# CONFIG_BSD_PROCESS_ACCT is not set 

# CONFIG_SYSCTL is not set 

# CONFIG_BINFMT_AOUT is not set 
CONFIG_BINFMT_ELF=y 

# CONFIG_BINFMT_MISC is not set 

# CONFIG_PARPORT is not set 

# CONFIG_APM is not set 


# Plug and Play support 
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# 

CONFIG__PNP=y 

# 

# Block devices 

# 

C ONFIG_BLK_DEV_FD=y 
CONFIG_BLK__DEV_IDE=y 

I Please see Doo»ent,tion/ide.Eor help/in£c on IDE drives 

£ 

# CONFIG_BLK_DEV_HD_IDE is not set 
CONFIG_BLK_DEV__IDEDISK=y 

CONFIG_BLK_DEV_IDECD=y 

# C ONF I G_BLK_DEV__I DETAPE is not set 

# CONFIG_BLK_DEV_IDEFLOPPY is not set 

# C 0 NFIG_BLK_DEV_IDESCSI is not set 
C ONFIG_BLK_DEV_CMD6 40 =y 
CONFIG_BLK_DEV_CMD64 0 JENHANCED=y 
CONF IG__BLK_DEV_RZ 10 0 0 =y 

CONF IG_BLK__DEV__IDEPCI =y 
CONFIG_BLK_DEV_IDEDMA=y 

# CONFIG_BLK_DEV_OFFBOARD is not set 
CONFIG_IDEDMA_AUTO=y 

# CONFIG_IDE__CHIPSETS is not set 

# 

# Additional Block Devices 

# 

CONFIG_BLK_DEV_LOOP=y 

CONFIG_BLK_DEV_NBD=y 

# CONFIG_BLK_DEV_MD is not set 

# C0NFIG_BLK_DEV_RAM is not set 

# CONFIG_BLK_DEV_XD is not set 

# CONFIG_BLK_DEV_DAC960 is not set 
CONFIG_PARIDE_PARPORT=y 

# CONFIG_PARIDE is not set 

# C ONF I G_BL K__C P Q_D A is not set 

# CONFIG_BLK_DEV_HD is not set 

# 

# Networking options 

# 

# CONFIG_PACKET is not set 

# CONFlG_NETLINK is not set 

# CONFIG_FIREWALL is not set 

# CONFIG_FILTER is not set 
CONFIG_UNIX=y 
CONFIG_INET=y 

# CONFIG — IP_MULTICAST is not set 

# C0NFIG_IP_ADVANCED__R0UTER is not set 

# CONFIG_IP_PNP is not set 

# CONFIG_IP__ROUTER is not set 
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# CONFIG_NET_IPIP is not set 

# CONFIG_NET_IPGRE is not set 

# CONFIG_IP_ALIAS is not set 

# CONFIG_SYN_COOKIES is not set 

# 

# (it is safe to leave these untouched) 

# 

# CONFIG_INET_RARP is not set 

# CONFIG_SKB_LARGE is not set 

# 

# 

# 

# CONFIG_IPX is not set 

# CONFIG_ATALK is not set 

# 

# SCSI support 

# 

# CONFIG_SCSI is not set 

# 

# Network device support 

# 

CONFIG_NETDEVICES=y 

# 

# ARCnet devices 

# 

# CONFIG_ARCNET is not set 
CONFIG_DUMMY=y 

# CONFIG_EQUALIZER is not set 

# CONFIG_NET_SB1000 is not set 

# 

# Ethernet (10 or 100Mbit) 

# 

CONFIG_NET_ETHERNET=y 

# CONFIG_NET_VENDOR_3COM is not set 

# CONFIG_LANCE is not set 

# CONFIG_NET_VENDOR_SMC is not set 

# CONFIG_NET_VENDOR_RACAL is not set 

# CONFIG_NET_ISA is not set 
CONFIG_NET_EISA=y 

# CONFIG_PCNET32 is not set 

# CONFIG_APRICOT is not set 

# CONFIG_CS89xO is not set 

# CONFIG_DM9102 is not set 

# CONFIG_DE4X5 is not set 

# CONFIG_DEC_ELCP is not set 

# CONFIG_DGRS is not set 

# CONFIG_EEXPRESS_PRO100 is not set 
CONFIG_NE2K_PCI=y 
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# CONFIG_TLAN is not set 

# CONFIG_VIA_RHINE is not set 

# CONFIG_NET_POCKET is not set 

# CONFIG_FDDI is not set 
CONFIG_PPP=y 

# 

# CCP compressors for PPP are only built as modules 

# 

CONFIG_SLIP=y 

CONFIG_SLIP_COMPRESSED=y 

CONFIG_SLIP_SMART=y 

# C0NFIG_SLIP_M0DE_SLIP6 is not set 

# CONFIG__NET_RAD10 is not set 

# 

# Token ring devices 

# 

# CONFIG_TR is not set 

# CONFIG_NET_FC is not set 

# 

# Wan interfaces 

# 

# CONFIG_DLCI is not set 

# CONFIG_WAN_DRIVERS is not set 

# CONFIG_SBNI is not set 

# 

# Amateur Radio support 

# 

# CONFIG_HAMRAD10 is not set 

# 

# IrDA subsystem support 

# 

# CONFIG_IRDA is not set 

# 

# ISDN subsystem 

# 

# CONFIG_ISDN is not set 

# 

# Old CD-ROM drivers (not SCSI, not IDE) 

# 

# CONFIG_CD_NO_IDESCSI is not set 

# 

# Character devices 

# 

CONFIG_VT=y 

CONFIG_VT_CONSOLE=y 

CONFIG_SERIAL=y 
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CONFIG_SERIAL_CONSOLE=y 

CONFIG_SERIAL_EXTENDED=y 

# CONFIG.SERIAL.MANY.PORTS is not set 

# CONFIG_SERIAL_SHARE_IRQ is not set 

# CONFIG_SERIAL_DETECT_IRQ is not set 

# CONFIG.SERIAL.MULTIPORT is not set 

# C0NFIG.HUB6 is not set 

# CONFIG.SERIAL.NONSTANDARD is not set 

# CONFIG.UNIX98.PTYS is not set 

# CONFIG.MOUSE is not set 

# CONFIG_QIC02_TAPE is not set 

# CONFIG_WATCHDOG is not set 

# CONFIG.NVRAM is not set 

# CONFIG_RTC is not set 

# 

# Video For Linux 

# 

# CONFIG_VIDEO_DEV is not set 

# 

# Joystick support 

# 

# CONFIG.JOYSTICK is not set 

# CONFIG.DTLK is not set 

# 

# Ftape, the floppy tape device driver 

# 

# CONFIG.FTAPE is not set 

# 

# Filesystems 

# 

# CONFIG.QUOTA is not set 

# CONFIG.AUTOFS_FS is not set 

# CONFIG_AFFS_FS is not set 

# CONFIG_HFS_FS is not set 

# CONFIG_FAT_FS is not set 

# CONFIG_MSDOS_FS is not set 

# CONFIG_UMSDOS_FS is not set 

# CONFIG_VFAT_FS is not set 
CONFIG_ISO9660_FS=y 

# CONFIG.JOLIET is not set 
CONFIG_MINIX_FS=y 

# CONFIG.NTFS.FS is not set 

# CONFIG.HPFS.FS is not set 
CONFIG_PROC_FS=y 

# CONFIG.ROMFS.FS is not set 
CONFIG_EXT2_FS=y 

# CONFIG.SYSV.FS is not set 

# CONFIG.UFS.FS is not set 
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# Network File Systems 

# 

# CONFIG_CODA_FS is not set 

# CONFIG_NFS_FS is not set 

# CONFIG_SUNRPC is not set 

# CONFIG_LOCKD is not set 

# CONFIG_SMB_FS is not set 

# CONFIG_NCP_FS is not set 

# 

# Partition Types 

# 

# CONFIG_BSD_DISKLABEL is not set 

# CONFIG_MAC_PARTITION is not set 

# CONFIG_SMD_DISKLABEL is not set 

# CONFIG_SOLARIS_X86_PARTITION is not set 

# CONFIG_NLS is not set 

# 

# Console drivers 

# 

CONFIG_VGA_CONSOLE=y 

CONFIG_VIDEO_SELECT=y 

# 

# Sound 

# 

# CONFIG_SOUND is not set 

# 

# Kernel hacking 

# 

# CONFIG_MAGIC_SYSRQ is not set 
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FTP note for pass-though applications 

The following perl code will allow you to get/put FTP data into/out from a perl array (which is useful when 
you have no local writable storage): 

use Net::FTP; 


$w = .ie 
TDIN; 

! 0 . 

ds 

misc*a 

P 

$x = .ie 

! 0 . 

ds 

misc*a 

P 


TDOUT; 

$ftp = Net::FTP->new('localhost') or exit(l); # connect 
$( someuser ' , 'somepass') or exit (2); # authenticate 

$ftp->type("A") or exit(3); # switch to ascii mode 

# to read remote data use this: 

$ftp->get('somefile', $ x > or exit(4); # read remote file into stdout 

@x - <$x>; # g e t stdout contents 

# to write remote data use this: 

foreach $y (@y) { # M loadup" stdin contents 

print $w "$y"; 

} 

$ftp->put($y,'somefile') or exit(4); # put stdin to remote file 

# remember to quit 

$ftp->quit(); # quit 
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Comparison of ssh and rsh* 

Marcus Vogt 
June 8, 2000 


Abstract 

The purpose of this document is to show the differences between run¬ 
ning ssh processes versus rsh processes. In particular we will concentrate 
on the benefits versus the costs of doing so. SSH and its dependents pro¬ 
vide secure (encrypted) communications between machines, however this 
security comes at a cost. The exact cost varies on what exactly is being 
done, but on average we expect a two fold increase in CPU usage as com¬ 
pared to RSH, and a third increase in memory utilisation. Given the load 
impact that rsh/rcp imposes on systems currently, this is not seen as a 
problem. Thus we have no hesitation in recommending the installation of 
SSH. 


*$Id: ssh-perf-rpt.lyx,v 1.24 2000/03/16 22:45:24 marcusv Exp marcusv $ 
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1 Introduction 

We have an opportunity to implement a secure and encrypted transport medium 
via ssh between our production machines to various support machines. Cur¬ 
rently these services are implemented via ftp and the like which are vulnerable 
to a number of attacks and snooping. The implementation of ssh can not pro¬ 
ceed until we provide some data on how this new system will impact on the 
performance of the machine it will be installed upon. 

We will provide the data by transferring files of various sizes between sys¬ 
tems. This will give us a fair indication of the processor and memory load that 
will be imposed by performing these tasks. We will also compare this to their 
counterparts (i.e. rep vs sep) to have a reasonable point of reference. 


1.1 Assumptions 

In producing this report we have made the following assumptions: 

1. Kenoback is more likely to perform like the production than KenoDR due 
to Kenoback running a full test environment. 


1.2 Benefits of SSH 

The benefits of using SSH are focused around network integrity. It ensures, 
using public key handshaking, that the hosts connecting are who they say they 
are It also encrypts the communications across the network guaranteeing that 
neither passwords nor sensitive data can be compromised. Finally it can use 
a further extension of public key handshaking to replace passwords - removing 
the need for passwords to be entered as part of scripts. 

2 Methods 

In determining how SSH will impact upon system performance in comparison 
to RSH we will examine three important areas: speed, memory and LEU. Ine 
time to execute the program in real time, along with the CPU usage is provided 
by the timex command. Memory usage is found by using ps. 

Issues to note: 

• file caching will affect performance. 

• other users and processes impact on ieal times. 

There are actually three points of experimental error: 

1. Local machine (other tasks being executed on the machine 

2. Network (high network traffic / collisions) 

3. Remote machine (other tasks being executed on the machine) 
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3 Kenoback 

Kenoback is a Concurrent MC-7000 series running RTU OS-6.1BV30. This 
system has three 68040’s running at 33 MHz and 32Mb of real memory plus 
100Mb of swap. It is important to note that this machine is used for development 
work and testing. This machine is thus more typical of one in use. 

3.1 SCP vs RCP 

In examining the two methods for coping files between systems, we need to 
analyse both processor and memory utilisation. 

3.1.1 Processor Utilisation 

The results shown in table 1 are the average of performing the same task it¬ 
eratively five times, capturing the system usage via the timex command. The 
exact command executed was: 

“timex scp filename zeus:/tmp/” 

“timex rep filename zeus:/tmp” 

Unfortunately the results for scp will be skewed for User and System as the timex 
command only monitors the scp process, not the ssh process that is forked off 
by scp. rep states in the manual that it uses rsh (also forking) to communicate 
to the remote host, does not in fact do so. It is assumed that rep holds the code 
for rsh internally. 


Filename 

Method 

File Size 

Real 

User 

System 

gcc-2_8_l_tar 

scp 

33812480 

392.93 

1.10 

33.30 

gcc-2_8_l_tar 

rep 

33812480 

150.33 

0.37 

67.154 

glibc-2.1-9909 

scp 

7700032 

97.23 

0.25 

7.50 

glibc-2.1-9909 

rep 

7700032 

30.25 

0.11 

18.92 

binutils-2.9.1 

scp 

5922130 

86.33 

0.21 

5.92 

binutils-2.9.1 

rep 

5922130 

20.65 

0.06 

12.32 

bash-2_02_tar 

scp 

1510428 

31.92 

0.07 

1.45 

bash-2 _ 02 _ tar 

rep 

1510428 

7.65 

0.04 

1.45 

rprn-3.0.3.tar. 

scp 

1210690 

28.93 

0.04 

1.18 

rpm-3.0.3.tar. 

rep 

1210690 

7.10 

0.03 

1.23 

ssh-l_2_26_tar 

scp 

1005284 

24.75 

0.05 

0.84 

ssh-l_2_26_tar 

rep 

1005284 

5.21 

0.04 

1.08 

rsync-2.3.2.ta 

scp 

313209 

22.93 

0.01 

0.32 

rsync-2.3.2.ta 

rep 

313209 

3.07 

0.03 

0.43 

ttymodes.c 

scp 

3121 

18.98 

0.01 

0.06 

tty modes.c 

rep 

3121 

0.61 

0.01 

0.15 


Table 1: File transfer comparisons via rep/sep 


To get a better indication of User and System utilisation we must review 
how ssh performs. 

Also worth noting is the larger initial time on an scp connection. This is due 
to the overhead of the initial authentication performed by ssh. 
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3.2 SSH vs RSH 

Again, we need to examine both processor and memory utilisation to get a clear 
picture of the performance issues. 

3.2.1 Processor Utilisation 

Given the issues we had with timex and scp it is important to compare the 
performance of the underlying layers. To examine this we executed the following 
commands 

“timex ssh zeus “cat filename” > /tmp/mgv/filename” 

“timex rsh zeus “cat filename” > /tmp/mgv/filename” 

We can see far more clearly now in table 2 what the impact on system resources 
is likely to be by running ssh versus rsh. 


Filename 

Method 

File Size 

Real 

User 

System 

gcc-2 8 l tar 

ssh -C 

33812480 

201.91 

138.02 

11.65 

gcc-2 8 l tar 

ssh 

33812480 

444.19 

312.03 

33.42 

gcc-2 8 l tar 

rsh 

33812480 

359.64 

1.25 

129.17 

glibc-2.1-9909 

ssh 

7700032 

107.13 

71.72 

8.05 

glibc-2.1-9909 

rsh 

7700032 

82.72 

0.31 

28.68 

binutils-2.9.1 

ssh 

5922130 

89.49 

55.44 

6.14 

binutils-2.9.1 

rsh 

5922130 

66.22 

0.31 

21.79 

bash-2 02 tar 

ssh 

1510428 

26.58 

15.46 

1.74 

bash-2 02 tar 

rsh 

1510428 

17.12 

0.08 

6.05 

rpm-3.0.3.tar. 

ssh 

1210690 

26.62 

12.62 

1.51 

rpm-3.0.3.tar. 

rsh 

1210690 

15.66 

0.06 

4.83 

ssh-l 2 26 tar 

ssh 

1005284 

20.56 

10.60 

1.27 

ssh-l 2 26 tar 

rsh 

1005284 

10.31 

0.09 

4.01 

rsync-2.3.2.ta 

ssh 

313209 

11.26 

4.01 

0.53 

rsync-2.3.2.ta 

rsh 

313209 

4.52 

0.04 

1.36 

ttymodes.c 

ssh 

3121 

6.62 

1.03 

0.33 

tty modes, c 

rsh 

3121 

0.65 

0.03 

0.17 


Table 2: Comparison between rsh and ssh to transfer files directly. 


Clearly we can see that there is a significantly greater utilisation in the user 
area for ssh due to the encryption. Interestingly ssh is more efficient in terms 
of system utilisation. If we do a comparison of “real” time and “CPU” (User 
and System) time we get the data shown in table 3. This clearly shows that 
we have approximately a three fold increase in system utilisation. One saving 
grace of this is that there is a significant decrease in system time, which means 
that it will be kinder to system resources. In particular, the nice command 
may be used effectively to guarantee minimal impact on the system. However, 
depending on system load it appears that the real time impact is around two 
fold depending on the file size. 
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3.2.2 Memory Utilisation 

To compare the memory utilisation for rsh and ssh we used: 

“ps -evf | grep ttyXX” 

The results from this process are as shown in figure 4. Interestingly Kenoback 
calls two instances of rsh whilst running, ssh runs two for the initial authenti¬ 
cation, but once this is completed it drops down to one. We can see that the 
difference between the two systems is an additional 80Kb or 33% of that used 
by rsh. Given that Kenoback has 32 Mb of real and 100Mb of virtual memory 
this imposition is considered insignificant. 


3.3 Summary / Notes 

Interesting to note was that even though Kenoback is a multiple CPU system, 
ssh only used one CPU. The other two CPU’s were 90% idle. 


Filename 

File Size 

Real (s) 

Real (%) CPU (s) 

CPU 

gcc-2_8_l_tar 

33812480 

84.55 

24% 

215.03 

165% 

glibc-2.1-9909 

7700032 

24.41 

30% 

51.40 

175% 

binutils-2.9.1 

5922130 

23.27 

35% 

40.10 

179% 

bash-2 _ 02 _ tar 

1510428 

9.46 

55% 

11.07 

181% 

rpm-3.0.3.tar. 

1210690 

10.96 

70% 

9.24 

189% 

ssh-l_2_26_tar 

1005284 

10.25 

99% 

7.77 

189% 

rsync-2.3.2.ta 

313209 

6.74 

149% 

3.14 

224% 

ttymodes.c 

3121 

5.97 

920% 

1.16 580% 


Table 3: Increase in real and CPU times when using ssh as compared to rsh. 
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4 Maxx 

Maxx is a concurrent Maxion 9000 running RTU OS-6.2V32. This system only 
has a single RM4400MC (MIPS) CPU running at X MHz with 64 Mb real 
memory and 100 Mb swap. 


4.1 SCP vs RCP 

4.1.1 Processor Utilisation 

The same technique was used for gathering the data for Maxx as was used 
with Kenoback. i.e. the average of performing the same task iteratively five 
times, capturing the system usage via the timex command. The results from 
this process are shown in table 5. The exact commands executed were. 

“timex scp filename zeus:/tmp/” 

“timex rep filename zeus:/tmp” 

Unfortunately the results for scp will be skewed for User and System as the timex 
command only monitors the scp process, not the ssh process that is forked off 
by scp. rep states in the manual that it uses rsh (also forking) to communicate 
to the remote host, does not in fact do so. It is assumed that rep holds the code 
for rsh internally. 

4.2 SSH vs RSH 

4.2.1 Processor Utilisation 

Given the issues we had with timex and scp it is important to compare the 
performance of the underlying layers. To examine this we executed the following 
commands 

“timex ssh zeus “cat filename” > /tmp/mgv/filename” 

“timex rsh zeus “cat filename” > /tmp/mgv/filename” 

We can see far more clearly now in table 6what the impact on system resources 
is likely to be by running ssh versus rsh. 

Clearly we can see that there is a significantly greater utilisation in the user 
area for ssh due to the encryption. Interestingly ssh is more efficient in terms of 
system utilisation. If we do a comparison of real time and CPU (User and 
System) time we get the data shown in table 3. This clearly shows that we have 
approximately a two fold increase in real time and it should be noted that the 
lower end of the spectrum is distorted by a three second connection phase. We 
also have a four fold increase in CPU time. We can use the nice command to 
effectively guarantee minimal impact on the mission critical applications. 


Method 

Instances 

SIZE (Kb) 

TSIZ (Kb) 

Total (Kb) 

rsh 

2 

64 

56 

240 

ssh 

i* 

212 

108 

320 


Table 4: Comparison between rsh and ssh in memory utilisation (Kb). 
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Filename 

Method 

File Size 

Real 

User 

System 

gcc-2_8_ l_tar 

scp 

33812480 

164.68 

0.35 

7.41 

gcc-2_8 ltar 

rep 

33812480 

68.08 

0.11 

13.29 

ATF.01 

scp 

12369920 

63.44 

0.14 

2.47 

ATF.01 

rep 

12369920 

24.10 

0.04 

4.92 

glibc-2.1-9909 

scp 

7700032 

42.65 

0.09 

1.89 

glibc-2.1-9909 

rep 

7700032 

20.53 

0.03 

4.65 

binutils-2.9.1 

scp 

5922130 

33.91 

0.04 

1.19 

binutils-2.9.1 

rep 

5922130 

15.04 

0.02 

2.85 

bash-2 _ 02 _ tar 

scp 

1510428 

17.74 

0.02 

0.19 

bash-2 _ 02 _ tar 

rep 

1510428 

12.37 

0.01 

0.21 

rpm-3.0.3.tar. 

scp 

1210690 

13.51 

0.01 

0.19 

rpm-3.0.3.tar. 

rep 

1210690 

8.15 

0.00 

0.15 

ssh-l_2_26_tar 

scp 

1005284 

14.19 

0.01 

0.17 

ssh-l_2_26_tar 

rep 

1005284 

6.82 

0.00 

0.20 

rsync-2.3.2.ta 

scp 

313209 

10.47 

0.01 

0.07 

rsync-2.3.2.ta 

rep 

313209 

4.53 

0.00 

0.06 

ttymodes.c 

scp 

3121 

6.12 

0.00 

0.02 

ttymodes.c 

rep 

3121 

1.46 

0.00 

0.05 


Table 5: Comparison of processor utilisation between SCP and RCP on Maxx 


Filename 

Method 

File Size 

Real 

User 

System 

gcc-2_8_l__tar 

ssh -C 

33812480 

161.08 

53.81 

3.89 

gcc-2_8_l_tar 

ssh 

33812480 

318.24 

111.76 

11.96 

gcc-2_8_l_tar 

rsh 

33812480 

236.57 

0.60 

24.75 

ATF.01 

ssh -C 

12369920 

79.64 

23.73 

1.96 

ATF.01 

ssh 

12369920 

119.57 

40.84 

4.15 

ATF.01 

rsh 

12369920 

93.21 

0.21 

8.72 

glibc-2.1-9909 

ssh 

7700032 

80.53 

25.70 

2.91 

glibc-2.1-9909 

rsh 

7700032 

53.62 

0.11 

5.38 

binutils-2.9.1 

ssh 

5922130 

61.82 

19.91 

2.11 

binutils-2.9.1 

rsh 

5922130 

42.01 

0.10 

4.22 

bash-2_02_tar 

ssh 

1510428 

20.40 

5.21 

0.63 

bash-2 _ 02 _ tar 

rsh 

1510428 

6.36 

0.01 

1.00 

rpm-3.0.3.tar. 

ssh 

1210690 

18.14 

4.29 

0.48 

rpm-3.0.3.tar. 

rsh 

1210690 

4.86 

0.03 

0.80 

ssh-l_2_26_tar 

ssh 

1005284 

13.87 

3.68 

0.37 

ssh-l_2_26_tar 

rsh 

1005284 

4.12 

0.02 

0.70 

rsync-2.3.2.ta 

ssh 

313209 

8.93 

1.32 

0.19 

rsync-2.3.2.ta 

rsh 

313209 

2.20 

0.00 

0.22 

ttymodes.c 

ssh 

3121 

5.05 

0.29 

0.09 

ttymodes.c 

rsh 

3121 

1.11 

0.01 

0.05 


Table 6: Comparison of processor utilisation between ssh and rsh on Maxx. 
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5 KenoDR 

KenoDR is a Concurrent MC-7000 series running RTU OS-6.1BV30. This sys¬ 
tem has three 68040’s running at 25MHz and 32Mb of real memory plus 100Mb 
of swap. 

5.1 SCP versus RCP 

5.1.1 Processor Utilisation 

KenoDR is similar to Kenoback, except that it is running at 25 MHz. We would 
have expected the performance of both rep and sep to be degraded, however as 
this is a clean system with only system default processes running and the tests, 
it actually out performs Kenoback as shown in Table 8. 

5.2 SSH versus RSH 

Again, we need to examine both processor and memory utilisation to get a clear 
picture of the performance issues. 

5.2.1 Processor Utilisation 

Given the issues we had with timex and sep it is important to compare the 
performance of the underlying layers. To examine this we executed the following 
commands 

“timex ssh zeus “cat filename” > /tmp/mgv/filename” 

“timex rsh zeus “cat filename” > /tmp/mgv/filename” 

We can see far more detail in table 9 what the impact on system resources 
is likely to be by running ssh versus rsh. Again we can see that there is a 
significantly greater utilisation in the user area for ssh due to the encryption. If 
we do a comparison of “real” time and “CPU” (User and System) time we get 
the data shown in table 10. This clearly shows that we have approximately a 
two fold increase in CPU time. The real time impact is around an 80% increase 
depending on the file size. The initial connect time of about seven seconds 
distorts the lower end of the spectrum. 


Filename 

File Size 

Real (s) 

Real (%) 

CPU (s) 

CPU 

gcc-2 8 l tar 

33812480 

81.67 

35% 

98.37 

388% 

ATF.01 

12369920 

26.36 

28% 

36.06 

404% 

glibc-2.1-9909 

7700032 

26.91 

50% 

23.12 

421% 

binutils-2.9.1 

5922130 

19.81 

47% 

17.70 

410% 

bash-2 02 tar 

1510428 

14.04 

221% 

4.83 

478% 

rpm-3.0.3.tar. 

1210690 

13.28 

273% 

3.94 

475% 

ssh-l 2 26 tar 

1005284 

9.75 

237% 

3.33 

463% 

rsync-2.3.2.ta 

313209 

6.73 

306% 

1.29 

586% 

ttymodes.c 

3121 

3.94 

354% 

0.32 

533% 


Table 7: Comparison between rsh and ssh on Maxx 
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Filename 

Method 

File Size 

Real 

User 

System 

gcc-2_8_l_tar 

sep 

33812480 

518.47 

1.88 

47.51 

gcc-2_8_l_tar 

rep 

33812480 

127.37 

0.36 

104.11 

glibc-2.1-9909 

sep 

7700032 

128.56 

0.48 

10.48 

glibc-2.1-9909 

rep 

7700032 

30.88 

0.11 

23.72 

binutils-2.9.1 

sep 

5922130 

97.20 

0.37 

8.25 

binutils-2.9.1 

rep 

5922130 

24.51 

0.10 

18.19 

bash-2_02_tar 

sep 

1510428 

30.84 

0.05 

1.78 

bash-2_02_tar 

rep 

1510428 

6.02 

0.06 

4.50 

rpm-3.0.3.tar. 

sep 

1210690 

25.83 

0.10 

1.37 

rpm-3.0.3.tar. 

rep 

1210690 

4.85 

0.06 

3.58 

ssh-l_2_26_tar 

sep 

1005284 

22.86 

0.05 

1.15 

ssh-l_2_26_ta.r 

rep 

1005284 

4.17 

0.05 

3.03 

rsync-2.3.2.ta 

sep 

313209 

11.23 

0.02 

0.33 

rsync-2.3.2.ta 

rep 

313209 

2.63 

0.02 

1.00 

ttymodes.c 

sep 

3121 

9.57 

0.00 

0.05 

ttymodes.c 

rep 

3121 

0.68 

0.03 

0.19 


Table 8: Comparison between rep and sep on KenoDR. 


Filename 

Method 

File Size 

Real 

User 

System 

gcc-2_8_l_tar 

ssh 

33812480 

600.52 

416.43 

42.89 

gcc-2_8_l_tar 

rsh 

33812480 

470.06 

1.60 

169.08 

glibc-2.1-9909 

ssh 

7700032 

137.87 

96.01 

9.81 

glibc-2.1-9909 

rsh 

7700032 

105.53 

0.48 

37.79 

binutils-2.9.1 

ssh 

5922130 

112.72 

74.16 

7.73 

binutils-2.9.1 

rsh 

5922130 

76.54 

0.29 

29.03 

bash-2_02_tar 

ssh 

1510428 

36.64 

19.78 

2.28 

bash-2_02_tar 

rsh 

1510428 

21.27 

0.11 

7.47 

rpm-3.0.3.tar. 

ssh 

1210690 

28.80 

16.21 

1.93 

rpm-3.0.3.tar. 

rsh 

1210690 

15.36 

0.14 

6.00 

ssh-l_2_26_tar 

ssh 

1005284 

25.46 

13.67 

1.55 

ssh-l_2_26_tar 

rsh 

1005284 

13.48 

0.09 

4.98 

rsync-2.3.2.ta 

ssh 

313209 

13.80 

5.06 

0.75 

rsync-2.3.2.ta 

rsh 

313209 

4.47 

0.04 

1.68 

ttymodes.c 

ssh 

3121 

8.13 

1.34 

0.33 

ttymodes.c 

rsh 

3121 

0.72 

0.04 

0.22 


Table 9: Comparison of transfer times for rsh and ssh for KenoDR. 
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5.3 Summary / Notes 

KenoDR was similar to Kenoback in that two of the three CPU’s were mostly 
idle during this process. 


6 Conclusion 

The data in this report demonstrates that the significant integrity (security) 
benefits provided by SSH come at a cost. Given the increase in transfer is not 
significant, the benefits provided far out weigh the capacity costs. 

In comparing the two systems we can see that there is a greater requirement 
for system resources by the SSH group of programs in comparison to the RSH 
group of programs. This overhead consists of two components: initial handshake 
and encryption. The initial handshake that SSH makes to verify that each of 
the hosts communicating is who they say they are adds between seven seconds 
for the Keno systems and four seconds for Maxx of real time to the transfer. 
This initial handshake skews the percentage difference between the real times 
for small file sizes as this initial handshake can take most of the time. The en¬ 
cryption overhead causes an increase of approximately twice the CPU resources 
of rsh/rep on the Keno systems and an increase of approximately four times the 
Qpjj resources of rsh/rep on Maxx. The additional time on Maxx is most likely 
due to the RISC nature of the CPU. These increases, given the impact that 
rep/rsh has on the systems currently, is not considered to be a serious impact 
on performance. 

Due to the high proportion of time that the SSH process spends in user 
space, we can use the “nice” command to tune the performance of the systems. 
The “nice” command affects the scheduling priority given to the process such 
that it can be made to only run when the CPU is idle. This will impact upon 
the real time required to run the “niced” process. 

The memory footprint of SSH in comparison to RSH is increased by a third. 
Given the real memory on these machines, the impact will be negligible. 

Given this data, the integrity (security) benefits provided by SSH far out¬ 
weigh the capacity costs. 


Filename 

File Size 

Real (s) 

Real (%) 

CPU (s) 

CPU 

gcc-2 8 l tar 

33812480 

130.46 

28% 

288.64 

169% 

glibc-2.1-9909 

7700032 

32.34 

31% 

67.55 

177% 

binutils-2.9.1 

5922130 

36.18 

47% 

52.57 

179% 

bash-2 02 tar 

1510428 1 

15.37 

72% 

14.48 

191% 

rpm-3.0.3.tar. 

1210690 

13.44 

87% 

12.00 

195% 

ssh-l 2 26 tar 

1005284 

11.98 

89% 

10.14 

200% 

rsync-2.3.2.ta 

313209 

9.33 

209% 

4.09 

238% 

ttymodes.c 

3121 

7.41 

1030% 

1.41 

542% 


Table 10: Increase in real and CPU times when using ssh as compared to ish 
for KenoDR. 
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KuangPlus: Automating Vulnerability 

Detection 

Warren Toomey & Jeff Howard 

School of Computer Science, Australian Defence Force Academy 

Abstract 


System administrators need good tools to detect system security vulnerabilities (bugs 
and configuration errors) on a timely basis. This paper examines a tool called 
KuangPlus, which helps to automate vulnerability detection, and which keeps its 
database of known problems up to date via interpreted information downloaded from 
vendors. 

Introduction 

One of the dilemmas facing systems administrators today is the amount of time they need to spend 
finding and fixing known security deficiencies in their systems. Information about new security 
deficiencies is made available in a timely fashion from operating systems vendors, application 
vendors, computer emergency response teams and other groups interested in computer security. A 
diligent sysadmin could spend every working hour monitoring these sources, determining if the 
local system is affected, and taking the steps to rectify any holes found. 

Currently, deficiency reports are usually written in a human language, e.g English, and describe 
what the problem is and how it affects a system’s security. In some cases, exploits or other 
programs are available to test if a system has a given weakness. These reports and programs are 
often digitally signed with a public key cryptosystem, so that the system administrator can verify 
that they did come from a particular vendor, and that the report or program has not been tampered 
with. 

In many cases, newly-found security holes give an attacker full system rights, e.g to become ‘root’ 
under Unix or ‘administrator’ under NT. In other cases, the holes give an attacker limited system 
rights. However, combinations of existing system deficiencies may be combined by an attacker to 
gain greater system rights than a single hole by itself. The vendor reports about individual security 
holes obviously cannot describe the effect of combined deficiencies. 


Existing tools like COPS, Satan and Nessus allow a sysadmin to scan a system for known software 
and configuration vulnerabilities. The Kuang tool, part of the COPS package, can detect chains of 
configuration mistakes which when combined can be used to penetrate a system’s security. 
However, all these tools rely on a database of known problems which are only updated when new 
releases are made; these tools do not keep up with the daily round of new vulnerabilities. 

We have a situation where 

• a tool like Kuang can provide an automated way of determining a system’s vulnerability to 
known security holes and their combination, but the tool does not track newly-found security 
deficiencies; and 
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• vendors and computer security groups provide timely reports of newly-found security 
deficiencies in a tamper-proof fashion, but only in a format which must be processed by a 

human. 

It seems obvious that, if the computer security community could be persuaded to provide details of 
security deficiencies in a rule-based format, then these rules could be processed by a Kuang-hke 
inference engine to automatically test a system’s vulnerability to the deficiencies. 


In order for such a combination to actually be taken up by both the providers of such rulesets, and 
by the end-users of the rulesets, such a system must have a number of characteristics: 


Vendors must produce security reports of new holes in ruleset form. 

One or more mechanisms must allow end-users to obtain new rules quickly and 

End-user^must be able to trust the rules obtained: they must be able to verify who created the 

rules, and verify that the rules have not been tampered with. 

As the rules must be executed on the end-user’s system, they must be written in a fashion that 
is relatively easy to read and understand. The rules must therefore be transmitted in source 

A Kuang-like inference engine should form part of the system, as this can determine the 
effect of deficiency combinations. Again, the engine must be distributed in source code form, 
in a way that identifies the author and shows that it has not been tampered with. 

1 The copy of the software and rulesets on the end-user’s system must be able to verify its own 
‘intactness’ before it is used each time. This prevents attackers from exploiting existing 
system holes, and modifying the system to prevent detection of the deficiency. Therefoie, 
sections of the system should be designed to be rarely modified, and to be placed on read-only 

> The system and the rules should be implemented in a machine- and system-independent 

language which has access to each systems APIs. , 

i The system should be able to obtain new rules from various rule sources, verify their author 
and integrity, integrate them into the local system, apply them, and report new system 
deficiencies to the system administrator in one operation. 


KuangPlus is a tool which has been designed to meet the criteria listed above. It was originally 
specified as the topic for a Masters’ project. Warren Toomey constructed the initial design foi the 
tool in early 1999. This was passed to Jeff Howard, who improved the design significantly and 
implemented the prototype of KuangPlus at the end of 1999. 


Design of KuangPlus 

Before the authors sat down and constructed KuangPlus, we drew up a set ot design guidelines that 
would ensure we met the criteria outlined above. 


' Separate the tool from the database of vulnerabilities. In this way the tool will be flexible and 
will be able to detect new vulnerabilities as soon as they are added to the database. 


Generality: 

Implement the tool in such a way 
is running on. 


that it is independent of the platform 


and operating system it 


164 


KuangPlus: Automating Vulnerability Detection 


AUUG2K - Enterprise Security, Enterprise Linux 


Maintenance: 

Have a central core of static (unchanging) code with dynamic rules representing the 
“database” of vulnerabilities loaded at runtime. The central core will not require 
maintenance as new vulnerabilities are discovered. 

Trust: 

The tool should be simple and obvious in its design and secure in its operation. 

The source for the tool and the rules must be in an easy to read format that gives the system 
administrator the confidence to use them. 

Downloaded rules should only be executed if their author can be determined with certainty, 
and if the sysadmin permits rules from that author to be executed. 

Completeness: 

Use an inference engine to reveal complex vulnerabilities as well as simple ones. This is a 
direct development from Kuang approach where a backward chaining, goal based, breadth 
first search, inference engine was used. 

Ease of use: 

Have a well described language for “rules” and have clear instructions for creating them, in 
order to make it easy for vendors and other interested parties to generate rules. 

The tool should generate useful information suitable to a wide range of system administrators, 
from the novice to the experienced. 

Choice of Implementation Language 

Perl was chosen for the language to implement KuangPlus for several reasons. It is commonly 
available on a broad range of systems, and has a rich standard library which will obviate the need to 
“re-code” the interface between the tool and each specific platform. 

KuangPlus is going to run in an environment where trust is very important. Perl executes 
interpreted code: this will allow the system administrator to read the code of KuangPlus’ core and 
any downloaded rulesets. Perl also provides features such as ‘taint’ mode and ‘use strict’, which 
help to prevent the evaluation of information from an untrusted source. In version 5.004 of Perl, the 
‘Safe’ module was further refined, and this has been used in the prototype to prevent the rules 
interfering with either the KuangPlus core or the environment in which the tool is running. 

Some Terminology 

Before we look at the overall structure for KuangPlus, we need to introduce some terminology to 
describe what is downloaded on the fly from vendors, and what is used by the KuangPlus core 
inference engine to determine the existence of security vulnerabilities. 

A maxim is a small piece of Perl code which is written by a vendor, security organisation, or 
security interest group to detect a security vulnerability on a system. The maxim will be digitally 
signed by its author and when downloaded, will be run within a safe ‘sandbox’ environment within 
KuangPlus. 


KuangPlus: Automating Vulnerability Detection 


165 



AUUG2K - Enterprise Security, Enterprise Linux 


If the maxim detects a system problem during execution, it will produce one or more rules which 
describe the problem. Each rule has an initial state, an end state, and an operation which will allow 
the transition from one state to another. 

For example, imagine that a junior sysadmin on a Unix system has left their home .cshrc file 
world-writable. Insertion of csh commands into this file would allow any user to masquerade as 
this sysadmin. A maxim written to detect this vulnerability might produce these rules. 


Initial State 

Operation 

Final State 

Any user 

Write fred’s .cshrc 

User fred 

Any user 

Write fred’s .cshrc 

Group operator 


When all of the maxims available to KuangPlus have been executed and generated a set of rules, 
the inference engine in KuangPlus will attempt to chain them together to create one or more plans. 
A plan is a single instance of a chain of rules which allows progression from a ‘known’ state to a 
‘goal’ state. For example, the desired plan might be Unknown external user -> Root/Administrator. 

KuangPlus Structure 
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Safe Operations 
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Figure 1: KuangPlus Design Concept 


KuangPlus will be composed of three modules (refer to Figure 1). The first module will provide the 
‘front end’ to the tool. It will provide the user interface, handle the loading of maxims and will 
build a search space of rules. The second module will contain the ‘inference engine’, which will be 
invoked with a reference to the search space of rules and will return any successful plans (i.e 
exploits) found. The third module is suggested by the use of the Perl ‘Safe’ module and will 
encompass any routines which should be available to the rules as they are evaluated within the 
‘Safe’ compartment. 


Use of the Perl ‘Safe’ Module 

The existence of the ‘Safe’ module was a minor revelation during the development of the prototype. 
It was stumbled across when one author was perusing the security related items discussed in the 
“Programming Perl” book (Wall, Christiansen & Schwartz, 1996). 
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There are three implicit properties which the process of loading and evaluating maxims should 
exhibit: 

1. The integrity of the KuangPlus main logic should be guaranteed to be quarantined from any 
interference with respect to the evaluation of the maxims. 

2. When each maxim is evaluated, there should be no residual effect on the evaluation 
environment caused by a previously evaluated maxim. Stating this same property from the 
other point of view, having been evaluated, a maxim shouldn t be able to leave any residual 
effect that might affect subsequent maxim evaluation. 

3. In the process of evaluating a maxim, there should be no unplanned interaction with the 
system on which it is being evaluated. 

The use of the ‘Safe’ module with a set of routines which allow and control interaction with the 
system satisfies the above three requirements. The Safe module is part of the standard Perl library 
in version 5.004 of Perl. The Safe module enables Perl code to be evaluated in a restricted 
environment where the only variables and routines which it can ‘see’ are explicitly ‘shared’ into its 
environment. This should satisfy property 1 and 3 presented above. By loading each piece ot code 
into a new ‘compartment’ the possibility of rules interfering with each other should be eliminated 
and that will satisfy the property 2 above. Similar sorts of sandbox execution environments exists in 
other languages such as Java and SafeTcl. 

Whilst the use of the ‘Safe’ module should give the users of the tool confidence that the opeiation 
of KuangPlus is reliable, there are some things which it can’t protect against which are worth 
noting. The potential for code to consume the CPU or memory of the host system is identified as a 
means by which clumsy or malicious rules could prevent the calling script from ever finishing. 
There are also complex issues surrounding the possibility of disclosure of environment variables 
and side effects that might occur if the compartment is able to access variables within the 
namespace of routines ‘shared’ into it. Whilst the consumption of memory and CPU is beyond our 
control at this stage, we have modified the design of our prototype so as to avoid these othei 
situations. 

Implementation of the KuangPlus Prototype 

There were a number of time and other constraints placed on the implementation of KuangPlus by 
Jeff Howard’s Masters’ project. The project was therefore limited to the development of a working 
prototype which would prove the KuangPlus concept. One notable omission placed on Jeff was that 
of digital signatures for maxims. Despite these constraints, Jeft produced a working system that can 
be easily extended to become a final version of KuangPlus. 

The design of the KuangPlus prototype is shown in Figure 2 below. It is composed of four logical 
units: the front-end, the inference engine, the ‘safe’ operations and the maxims themselves. 
Ignoring the maxims, the prototype consists of 620 lines of well-commented Perl code. 
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The front-end is the program which the user invokes. It handles resolving the command line 
options; setting up the run-time environment as required including initialising variables and loading 
additional modules; evaluating or rejecting each of the maxims; invoking the inference engine; and 
handling the results in some meaningful way. Of course, KuangPlus can be invoked automatically 
at a set time without any manual involvement. 

The inference engine is of the “backwards chaining, goal based, breadth first” type. In simple 
terms this means that, based on a nominated ‘goal’, the logic will look through the search space of 
rules to see which rules can be combined (‘chained’) to achieve the goal given some initial 
condition. The logic is such that it starts with the goal (hence the descriptive terms ‘backwards’ and 
‘goal based’), and attempts to find a non-empty chain of rules which will achieve the goal. Having 
found a non-empty set of matching rules, these rules will then act as a temporary goals, for which 
the search space will be re-examined to see which rules will allow these new goals to be achieved 
(hence the application of the term ‘breadth first’). The process repeats until a rule is found which 
represents an initial condition or there are no matching rules. At this point the work of the inference 
engine is complete. 
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The general properties of the ‘Safe’ module have been discussed already. For the purpose of the 
prototype, the default set in Perl has been used. This decision was based on a need to develop a 
working prototype and in the knowledge that the methods available in the Safe module and the 
associated "Opcode" module will allow this decision to be easily reviewed and further restrictions 

added as necessary. 

An example of the error message which is generated at runtime by a maxim trying to access an 
operation to which it has not been given access is shown below. In this example, the stat operation 
was not available to the maxim: 

Use of uninitialized value at ./safe.pi line 31. 

$KuangPlus::rtn = (stat trapped by operation mask at 
/home/Kuangplus/unsafe.pl line 16.) 

If a given maxim attempts to access an operation which is not available to it, the attempt will be 
detected at runtime, an error similar to that shown above will be written to the screen and the 
maxim in question will not be evaluated. 

Interface Between Front-End and Maxims 

Each maxim is loaded from a file and evaluated in its own sandbox created by the ‘Safe’ module. 
The interaction between the maxim and the environment in which it is evaluated is restricted to two 
types: the creation of rules in a specific associative array which is ‘shared’ into the compartment, 
and the use of a set of subroutines also ‘shared’ into the compartment. 

In the KuangPlus protoype, the subroutines available to the maxims are caching emulations of 
normal Perl system calls: stat (), uname (), getpwent (), getpwname () & getpwuid () . These give 
a maxim the ability to derive account and system specific information. In the full-blown 
KuangPlus, many other safe routines will be added to the sandbox. 

Because the subroutines have the same name as the operating system equivalents, maxims can be 
tested outside of the ‘Safe’ environment. The emulated subroutines also cache information: system 
information such as user-ids need only be obtained once, and will then be served to maxims from 
the cache. The biggest advantage though is that maxims must use these routines and so there is a 
tight control over what information about the system is available to them and how they can get at it. 


Interface Between Front-End and Inference Engine 


The inference engine is passed a reference to an associative array which contains the accumulated 
rules from the evaluation of the various maxims. The induction engine returns to the front-end an 
array of successful exploits, if any were found, in the form of plans. If no exploits were discovered, 
then a message is printed by the front-end stating as much. If the return value is non-empty, then a 
subroutine within the front-end is invoked which will cause each exploit to be printed as a chain of 
states and a description of how an intruder would transition from one state to the next. 


Syntax of Rules 

The generated rules that are produced when a security deficiency is found must be able to express 
that deficiency. At present, the KuangPlus prototype has borrowed much of the details from the 
original Kuang tool; we expect that other states and transitions will be required to represent more 
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sophisticated system security deficiencies. 

To review: a rule describes a security deficiency, and has an initial state, a final state, and a method 
of transitioning from the initial to the final state. The state types available in the prototype are: 

user-id 

A particular numeric user-id on a Unix system 

group-id 

A particular numeric group-id on a Unix system 

pathname 

A full pathname for a file on a Unix system 

version 

The version details for a particular piece of software 
The transition operators in the prototype are: 

• “can obtain user-id” 

• “can obtain group-id” 

• “can overwrite” 

• “can replace” 

• “is version” 

The KuangPlus prototype encodes both state information and the transition operators in order to 
reduce the storage size of each rule, and to make rule chaining more efficient. 

Some example rules (in initial state, operator, final state format) are given below. Symbolic 
user/group identifiers have been substituted for numeric ones. 

1. User-id any, is version Sendmail 5.64, User-id root 

2. Group-id wheel, can overwrite /etc/passwd, User-id root 

3. User-id operator, can overwrite /etc/group, Group-id wheel 

4. User-id any, can overwrite /home/staff/fred/.cshrc, User-id fred 

5. User-id any, can overwrite /home/staff/fred/.cshrc, Group-id operator 

Having knowledge of the above rules, any user on the system can gain root privileges via two 
routes. Because Sendmail 5.64 is installed, a user can exploit bugs in this service to become root 
immediately. 

Alternatively, a user could chain rules 5, 3 and 2 together as follows: overwrite 
/home/staff/fred/.cshrc to obtain operator group permissions, overwrite /etc/group to obtain group 
wheel permissions, then finally overwrite /etc/passwd to obtain root privileges. 

An Example Maxim 

The following is an example of a simple KuangPlus maxim to be executed in the ‘Safe’ 
environment. The maxim detects a security problem if the running Linux kernel is too old. 

package Linux; 

main::pdebug "Loading Linux rule now . 6.1; 

# CIAC_J03 5 - 
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# ESB-1999.039 -- CIAC Bulletin J-035 

# Linux Blind TCP Spoofing 

# 22 March 1999 
sub CIAC_J035 { 

my ($description) = "CIAC_J035"; 

# The uname array has something like this in it: 

# Linux (none) 2.0.34 #1 Fri May 8 16:05:57 EDT 1998 i486 unknown 
my ($version) = (main::uname{))[2]; 

@f rag = split('\.', $version); 

if (($frag[0] <= 2 ) && ($frag[l] <= 0) && ($frag[2] <= 36)) { 

# This rule is triggered if the version of the 

# Linux kernel running on this machine is less 

# than a known level. Note that we record the 

# string "Linux-$version" using the kernel 

# version of this system so that the "known_facts" 

# will match it when the "rule_engine” chews over all 

# the plans. 

$main::new_plans{"u -1 v Linux-$version") = $description; 

} 

} 

my (@uname) = main::uname(); 
if ( "$uname[0]" =- /Linux/ ) { 

main::pdebug "Invoking Linux rule set", 6.3; 

CIAC_J035(); 

} else { 

main::pdebug"Skipping run_Linux rules", 6.3; 

> 

This maxim, once loaded by the KuangPlus front-end, will determine if the cached system uname 
information contains the word ‘Linux’. If so, the ciac_j 035 subroutine is invoked. This tests to see 
if the version of the Linux kernel is below 2.0.36. If so, the rule u -1 v Linux-$version is 
generated and exported back to the front-end. The meaning of the encoded rule is: if the system is 
using a Linux kernel below 2.0.36, then any external user can become any real user-id on the 
system. 

Sample Output from Prototype 

When a plan (a chain of rules) has been found that leads to a desired final state (such as obtaining 
root privileges), KuangPlus can print out the plan in encoded format, or with textual details 
provided by the maxims themselves. 

The following example shows a verbose-style plan, where a Linux system has been “seeded” with 
a world-writeable /etc/group file and is running a 2.0.34 Linux kernel. 

Success: "u 0 w /etc/passwd g 0 w /etc/group u .* v Linux-2.0.34" 

The verbose breakdown follows: 

The goal is "u 0" 

Plan: "u 0 w /etc/passwd" 

(root access via. writeable /etc/passwd) 

(verbose) you can get access to userid 0 if 
(verbose) you can overwrite file /etc/passwd if 
Plan: "w /etc/passwd g 0" 

(/etc/passwd is writeable by gid 0) 

(verbose) you can overwrite file /etc/passwd if 
(verbose) you can get access to the gid 0 if 
Regex Plan: g .* w /etc/group == g Ow /etc/group 
(/etc/group can be overwritten) 

Plan: "w /etc/group u 
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(/etc/group is world writeable) 

(verbose) you can overwrite file /etc/group if 
(verbose) you can get access to userid .* if 
Regex Plan:u -1 v Linux-2.0.34 == u .* v Linux-2.0.34 
(CIAC_J035) 

Known: "v Linux-2.0.34” 

(From POSIX::uname() call.) 

With a Linux 2.0.34 kernel and a world-writable /etc/group file, any external user can obtain root 
privileges on this system. 

Current Status 

At present, no further work has been done on KuangPlus since the completion of Jeff Howard’s 
Masters’ project. The KuangPlus prototype that he developed is available at 
http://minnie.cs.adfa.edu.au/KuangPlus, along with his project report. 

We expect that the prototype will form the basis of a fully-formed version of KuangPlus. One issue 
that is yet unresolved is the choice of an appropriate system of digital signatures to determine a 
maxim’s author. The ‘Pretty Good Privacy’ package developed by Phil Zimmerman was considered 
a likely candidate for this role. Whilst developing the prototype, the authors became aware of the 
‘Penguin’ module for Perl which seems tailor-made for this role. Penguin is described as having 
“vastly simplified, superior, and innate methods of ensuring safety and security . To date, Penguin 
is not part of the standard Perl distribution, but it is considered a very likely candidate for future 
inclusion. 

In order for KuangPlus to be valuable, not only must a complete version of the tool exist, but a 
critical mass of vendors must supply vulnerability reports in KuangPlus maxim format. During the 
production of the complete KuangPlus, we will need to encourage vendors and other security 
interest groups to provide maxim-format reports. We will also produce a base set of KuangPlus 
maxims to detect many of the common and well-known Unix security vulnerabilities. 

Although the prototype of KuangPlus was developed on a Linux system, one prime goal was for it 
to be highly portable. The prototype core works ‘as is’ on FreeBSD systems, and during the 
development of the complete version, we want to ensure that the core executes on such diverse 
platforms as Linux, FreeBSD, Solaris, NT and MacOS. Of course, some maxims can be shared 
across two or more platforms, but many maxims will be specific to a single operating system. 

Conclusion 

A new computer security tool, KuangPlus, has been designed and prototyped. The design uses ‘on 
the fly’ loading to access a database of known security vulnerabilities which would be interpreted to 
produce a number of existing vulnerability rules . The rules can then be assessed by a backward 
chaining, goal based, breadth first inference engine. Any vulnerabilities detected would be reported 
to the system administrator. The existing prototype meets most of the design goals chosen for 
KuangPlus. 

Future development of the prototype include developing automatic methods for authenticating the 
author of maxims and generating interest for the tool in the security community, so as to open the 
prototype to scrutiny and to generate interest in writing of maxims. 
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The development of KuangPlus along the lines presented above has the capacity to create a tool 
which is general enough to run on just about any computing platform. Combined with a rich and 
timely set of maxims, KuangPlus would enable any publicly identified vulnerability, in either the 
configuration or the software running on the system, to be exposed pro-actively by the system 
administrator. 


Warren Toomey 
2000 - 05-25 
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Footnote Presentation 
Security for the Rest of Us 
Paul Russell, 

Linux Kernel IP Firewall Maintainer 


Biography 

Paul Russell, sometimes known as 'Rusty', has been working with Linux-based 
Internet firewalls and security since 1993. He began making modifications to the 
Linux kernel firewall module (ip_fw) throughout 1997, culminating with him 
becoming the Linux Kernel IP Firewall Maintainer in 1998. WatchGuard, Inc, hired 
him to develop an enhanced firewalling infrastructure for future versions of Linux, 
beginning in the 2.3 kernel series. He has been a regular columnist for Linux 
Magazine, and has delivered tutorials and papers at Linux Kongress 1998 and 1999, 
LinuxWorld March 1999 and August 1999, Conference of Australian Linux Users 
1999 (which he helped to organise). He is the author of the Linux Kernel Locking 
HOWTO, and the Linux Kernel Hacking HOWTO, and various different Linux 
firewalling guides. He currently works for Linuxcare (Canberra). 
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Keynote Presentation 

Evangelising Open Source in the Enterprise 

David L. Sifry 


Biography 

David Sifry, Linuxcare co-founder and Chief Technical Officer, is a recognised expert 
on Open Source development and the Linux operating system. Himself an Open 
Source developer, Sifry has contributed code to such projects as GNU Emacs, 
Majordomo, packetrace, jitterbug, and to the Linux kernel itself. In addition, he 
managed the development of SecureVPS, an Open Source virtual private networking 
server for Linux. His activities in the Linux community include service as vice- 
president of the Bay Area Linux Users Group (BALUG). 
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Invited Presentation 
1001 Reasons to Hate Linux 
Darren Reed 


Biography 

Darren Reed is the primary author of IPFilter, a widely used packet filtering package 
for Open System and a frequent contributor to a number of security discussion groups 
and mailing lists. 
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AUUG2K 


ISOC-AU: The Internet for Everyone! 


Abstract 


Delivering on the vision that ?The Internet is for Everyone!? brings with it substantial technical 
challenges including: delivering increased bandwidth at the right price, increasing the number of 
users by an order of magnitude, increasing the number of internet devices by three or more orders 
of magnitude, and moving away from the traditional personal computer Internet interface. Security, 
authentication, ease of use, individualised interfaces and regional and remote delivery are all 
important technical challenges. 


However, development of the Internet is not solely a technical challenge. For instance, over the last 
year the Internet Society of Australia has been deeply involved in the debate over Internet 
censorship. Similarly, the struggle between open and proprietary standards is as much a social as a 
technical issue. ISOC-AU will continue to be active in these areas providing sound technical 
comment and a users? perspective on emerging Internet issues. We are also seeking to influence 
the direction in which the Internet develops by facilitating an Internet Cooperative Research Centre 
in Australia. 


Introduction 


Internet use continues to grow at fantastic rates, reaching into our business and personal lives. As 
this growth continues, some may think that the challenges of the Internet have been solved. In 
contrast, I would argue that the major challenges have just begun particularly when you try to 
deliver on the vision that ?The Internet is for Everyone!? 


At ISOC-AU, we have found that the challenges are not just technical but also social. ISOC-AU 
has been actively working on a number of key issues, including Internet censorship, from a sound 
base of technical knowledge. In the longer term we see a need for building user requirements into 
the technical design process as a means of developing the Internet for use by everyone. 


Technology Base and User Benefits 
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As a computer based technology, the Internet has already been amazingly popular and it is 
delivering significant benefits. But change is slow and many users have not even exploited most of 
the power of the existing technology. In the world of Microsoft products, for instance, I am sure 
that most corporate users would understand only 50% of the power already built into Word. And 
Word is only just being modified to include some open source elements such as HTML. Indeed, 
economists are still debating whether the impact of computer technology is showing up in 
productivity measures. 


But this is an open standards conference. One challenge for open standards is to deliver powerful 
benefits to users from a more transparent technology base. There is good potential from a business 
model with open source software and commercial delivery of services and support. Given the 
challenges of installing Linux for desktop use, there is plenty of money to be made from support 
and customisation J. However, beyond the issue of extracting benefit from existing computer based 
technology there is a question of extracting value from convergence of computing and 
telecommunications. 


The Internet will become the most pervasive communications technology since the telephone. The 
combination of digital data processing power with world wide, mobile communications technology 
will produce even more significant impacts. To follow the comments of Bill St Amaud, Senior 
Director Network Projects, CANARIE, there has been growth in person to computer and person to 
person communication through the Internet. We may well see the biggest future growth in 
computer to computer communication. 


Whether the Internet remains a computer based technology is now open to challenge. This is where 
I start to get in over my head with some new technologies. Wireless and WAP are likely to be early 
candidates. But if you can deliver information over WAP why not eliminate this element and just 
deliver Internet content tailored to your mobile device using XML? Bandwidth and screen size are 
limitations. Then there is 3G mobile, perhaps with full Internet content. Internet appliances enter 
the scene. And what about Bluetooth? Can we call this approach the ?multi mobile Internet?? 


One model of the benefits from the multi mobile Internet is to deliver current benefits in more and 
more places, eg sporting results and stock prices anywhere anytime, backed by powerful computing 
resources, ASPs and databases. Result: mass production in cyberspace or very large numbers of 
devices, each generating small revenues to produce large profits. In some ways, the multi mobile 
approach could be seen as a one to many communications arrangement, similar to broadcasting. 
Maybe I am old fashioned and attached to computing power in my own desktop unit but I get a 
sense of loss from this model. 


In contrast, the current model of the Internet is based on a many to many structure, with a relatively 
simple (dumb?) network surrounded by smart devices (computers)! This model produces a 
significant challenge to Telcos who rely on providing value inside the network for their bread and 
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butter. But for me as a user, it provides greatly increased levels of control. I know that I can store 
important data on my hard drive (and back it up). I know fairly predictably what my software will 
do. (Although as my daughter commented the other day ?computers always seem to need fixing?) 
I can employ the intelligence of my own device to provide additional value. In this context, some 
of the things I need are: 

simpler interfaces 

easier setup 

better delivery of user value 


Would I have sufficient trust to move these personal resources out onto service provider on the 
net? Some pundits are suggesting that service level agreements or SLAs are the answer to user trust 
on these issues. Maybe there is an opportunity for the open source community here. Greater 
transparency can provide greater trust. I don?t really know the answer to all this. If I did, I guess I 
would be making money instead of standing here J. 


As you can see, we have not even begun to tap the full power of the technology for many users. 
Remember that power users are mostly running Microsoft products over a 56k link at best. Better 
technologies are becoming available daily. In Australia, we will see the battle of the broadband 
connectivity suppliers over the next 12 months. These are substantial technology changes that will 
make possible a new generation of popular applications. But for a user they are not the only 
questions. Two key questions are: will I be able to get affordable access and if I can get access 
what benefits or costs will it bring? In a social sense, the potential changes from these technology 
developments are as large as those from the industrial revolution. We are facing implications that 
are just as significant. 


Social Issues and Debate 


In the policy arena these challenges are emerging daily. The utopian, perhaps anarchistic, vision of 
information for everyone and many to many connectivity, does not always meet a favourable 
response when it challenges existing positions and structures. 


Censorship has been the prime example of doubts about easy access to all types of information on 
the Internet. Early last year, with the pressures of legislating sale of the second portion of Telstra 
bearing down, the government saw an opportunity to respond to sectional concerns about 
pornography by launching an Internet censorship regime. I don?t know whether producers of 
filtering software also inspired this activity but the result has been greatly in their interests. 
Initially, it looked as though the government was looking for central points to filter Internet 
content. For those with even a little technical knowledge of the net, this was a stunning case of 
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turning the world on its head! ISOC-AU worked hard with other individuals and organisations to 
try to explain the technical fundamentals and lobby against the legislation. 


Without revisiting all the heated discussions, in the end we did get Internet censorship legislation in 
the form of the Broadcasting Services Amendment Act 1999. This legislation provides for people 
to complain about content to the Australian Broadcasting Authority. It provides for take down 
notices to Australian Internet Content Hosts (ICHs). And, it provides for ISPs to be required to 
block restricted international content. In the legislation, failure to meet requirements can attract 
fines up to $25,000 per day. This legislation was a poor outcome for users, the industry and the 
Internet. 


Then came the political solution. Under the legislation, if you comply with a relevant industry code 
then you don?t have to undertake filtering of international content. We are yet to see a test case on 
how this arrangement works. But many ISPs have been careful to provide access to filtering 
software for users to load on their own computers. Are there any filtering products for Linux or 
Unix, I wonder? No one has yet been able to confirm whether the endorsed products predictably 
protect users from every piece of undesirable content. Indeed, if parents rely solely on filtering 
software to protect kiddies from undesirable content, they may be led into a false sense of security. 


For those with a technical orientation, this messy solution was no solution. It holds no technical 
logic but it does have plenty of political logic. What a contrast there is between social solutions and 
technical solutions! When we look at all the issues that needed to be or could have been addressed 
during 1999, why were we focused on Internet censorship? 


For the record, ISOC-AU regards the censorship issue as not closed. We have called on the 
government a number of times to remove the requirements on ISPs from the Act. This would leave 
the complaints provisions and the take down notices. The legislation is due for review and we will 
continue to press our views. This position is important because it has the danger of providing a 
precedent that social problems can be addressed primarily through technical solutions. If that line 
of thinking were to continue we would suffer severe damage to the development of the Internet. 
The next cab off the rank is on-line gambling. There are many social problems associated with 
gambling whether in real life or online, but we do not want to see inept technical solutions put 
forward to address substantial social problems. 


I do not propose to go through other social issues in similar detail but there are important lessons to 
be learnt. The censorship issue has shown us that social factors have the potential to substantially 
influence implementation of technology. This may be no news to those who continue to see social 
benefits from their technology work. Indeed, I am still a believer in the vision that the Internet can 
deliver substantial social benefit. But anyone who thinks that gaining benefit from technology is 
just a matter of technical development and implementation should think again. 
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Social Challenges 


The Internet is fundamentally changing power structures and rights in the community. These 
changes are a recipe for intensive public debate and strange political solutions. In some situations 
we are seeing the current rights of citizens being eroded by the power of convergent technology, 
including privacy, security and consumer rights. In other situations, such as copyright, markets and 
royalty income, we are seeing the position of businesses being significantly changed. The ability to 
do international transactions over the net just as easily as local transactions raises the issue of 
enforcement of rights internationally for both citizens and businesses. These are all areas where 
social questions are overlapping with technical considerations. As we come to move many more of 
our transactions onto the Internet, we will come to see even more significant social implications. 

As many have said before, information will become the essential element of transactions leading to 
emergence of the information society and economy. 


In the growing information economy, business to business (B2B) transactions are the area attracting 
most interest. Major global firms are moving strongly into this area in an effort to cut costs, reduce 
inventories and cut risk. The Chairman of the US Federal Reserve, Alan Greenspan, is eloquent on 
this topic. His speeches provide a detailed account of the impact of computer technology and the 
Internet on increased business productivity. It is hard to avoid the conclusion that Greenspan 
considers that US growth rates have been higher and interest rates have been lower than they 
otherwise would have been due to the impact of technology. But he is not a utopian! Interest rates 
have been increased. The NASDAQ tree has been shaken and some companies have fallen. Now 
the US rate of growth is starting to ease. 


The major public examples of B2B Internet use are the major procurement operations being 
established. In the US the large car manufacturers have set up their joint procurement network. For 
example, the Ford motor company has bought $US78 million worth of auto parts in its first online 
auction. This purchase saved the company $US10 million. Australian mining giants, RioTinto, 
WMC and BHP have joined 11 global rivals in a joint venture to purchase more than $300 billion 
of supplies over the Internet. There is also Global Net X-change, a venture between six retailers 
including Coles Myer, with an estimated buying power of $350 billion. Big companies with access 
to substantial technical resources are just beginning to use Internet technology for their own benefit 
and the dollar values are already huge. Is the net just about large players using new technology to 
put the squeeze on suppliers and little guys? Maybe. 


In the business to consumer (B2C) area, action is not quite so strong. Amazon is still there, 
although investor confidence is starting to wane with the continuing losses. Auction sites are 
beginning to enter the market and online banking seems to have found a niche. But this is an area, 
where businesses are finding the value challenge a little more difficult. Only 6 % of Australian 
adults used the net last year to purchase or order goods or services for their own use. There is still a 
trust challenge for dealing with companies you cannot physically meet. Online businesses with 
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reputable approaches to business will benefit from word of mouth but more work needs to be done 
on security, privacy, authentication and consumer rights before this situation will change 
substantially. Many people are wary of making payments over the net. 


As a consumer, why would you bother with these online challenges when you have an equally 
competitive supplier just down the street? The answer lies in the value of the offering. Businesses 
need to find out how to use the Internet to deliver better value to customers. Amazon started down 
this road but much more work needs to be done. 


User Requirements 


Have we come all this way with the Internet, just to provide slightly cheaper products through B2C 
and enhance large company purchasing through B2B? Or is there still more to the Internet dream? 

I am suggesting that we can achieve more through turning the initials around. Lets start looking for 
some C2B, instead of B2C. Let?s start to deliver on the dream of many to many communication 
and a network with intelligence distributed to the edges. There is a major opportunity for 
consumers to access and share information in a way that revolutionises the business to consumer 
relationship. Open systems continue to provide the technologies that support the Internet. Can they 
meet the challenge of taking user benefits to a new quantum level? 


In ISOC-AU we have been developing a user perspective with a sound base of technical 
understanding. We have used this perspective in our work on public issues. Over recent months 
we have been seeking to inject this perspective into development of the Internet. However, I have 
got to say that when some geeks are absorbed in their technical triumphs and business people are 
looking at making a fortune in a week, it is a little hard to get their attention! Even so, I think we 
have made some progress. 


The model we have supported was to inject user requirements into the technical design process. It 
is based on work that has been done at the Centre for International Research on Information and 
Communications Technology (CIRCIT) and was expanded with cooperation from universities and 
the Australian Library and Information Association. One of the roles of the Internet Society is to 
foster research about the Internet and internetworking. We have been helping to facilitate a 
Cooperative Research Centre for Internet technology. In some ways the CRC model is very suited 
to the type of issues that surround the Internet. It seeks to bring users, companies and researchers 
together in formal, strategic relationships and is very focused on achieving real world applications 
not just conducting research for its own sake. However, some aspects of the model have not been 
so helpful. The CRC calls for research over a long period of up to seven years. No one really 
knows what will happen in one year on the Internet, let alone seven years. Also, the CRC must 
have a primary focus on natural sciences and engineering. This means that any social research on 
user needs takes more of a back seat. 


198 


ISOC-AU: The Internet for Everyone! 


AUUG2K - Enterprise Security, Enterprise Linux 


Even so, we have worked hard over the last nine months to press forward the relevance of taking 
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The Giga-Network 

Geoff Huston 

ABSTRACT 

Network technologies continue to expand, both in speed, volume and number of clients. Over the next cou¬ 
ple of years we will see the advent of the Giga Network, where the basic unit of fixed connectivity will be 
measured in gigabits per second. The network itself will be supporting giga-streams, composed of thou¬ 
sands to millions of megabit streams. The network will also scale up to supporting giga-connections, with 
over 1 billion attached devices. That will come predominately from the mobile device sector. 

What are the challenges facing us to engineer a network to scale to such Giga-dimensions? What technolo¬ 
gies will be critical? What will be discarded on the way? This session will explore some aspects of the Gi- 
ga-network with a particular focus on the engineering challenges presented by such a network. 


The Giga-Network 
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Toaster: A High Speed Packet Processing 
Engine 



Andrew McRae 

Distinguished Engineer 
Cisco Systems Australia 

Email: amcrae@cisco.com 


Abstract 

The Internet explosion has become a reality, where the number of users and the amount of data 
traversing the Net has doubled several times in the last few years. Whilst this has changed the way 
we work, live and play, the plumbers responsible for keeping the bits flowing have sometimes had a 
hard time keeping up with the dramatic growth in demand for services and bandwidth. 

In the last 5 years the equipment used to run the Internet has morphed from simple routers to optical 
switches. In conjunction with this, the advent of the so-called New World of telecommunications 
(where traditional connection driven modes of voice and service delivery is being supplanted by 
integrated Internet Protocol services) has demanded new levels of intelligent classification and 
control. 

One effect of this demand is the development and introduction of specialised processing engines 
dedicated to networking, generally termed Communication Processors. Some of these range from 
simple microcoded engines to full blown dedicated CPUs. 

This presentation examines the evolution of this class of processors, and discusses the underlying 
motivations and requirements. 


Introduction 

With the rapid deployment of integrated digital networks, there is a vast demand for higher speed 
and more sophisiticated devices to run these networks. This paper examines one aspect of the 
developments required to meet this demand, that of the design and implementation of a new 
specifialised communications processor. 

Cisco has traditionally been a user of off-the-shelf processors (including Communication 
Processors), but it has been clear with the growing demand for faster and more powerful switches 
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that developing Cisco’s own Communication Processor would be the only way of delivering 
products that met the challenge. This presentation describes the result of this effort from a technical 
viewpoint, officially known as PXF (Parallel eXpress Forwarder), but nicknamed Toaster (which is 
a lot easier to say). 

The architecture of the processor is described, highlighting areas where the processor was 
specifically tailored for processing packets, and showing how such a processor differs significantly 
from typical CPUs. The challenges of building such a processor are described, and some results 
presented indicating how processing compares to more traditional packet forwarding methods. 

New Technologies and Features 

There are two basic pressures that exist in the development of new Internet devices; bandwidth and 
features. 

The bandwidth pressure stems from the large scale deployment of Internet related communications 
using new (xDSL) and existing technologies (ISDN), and from the deployment of newer 
technologies such as: 

• Gigabit Ethernet. Newer laser and fibre optic technology now allows Gigabit Ethernet to 
operate at distances upwards of 70 or 80 Kms. Gigabit ethemet is now being used as high 
speed uplinks for service providers and enterprise organisations, as well as high speed trunks 
within a building. The ubiquitous nature of ethemet and the ease of interconnection has 
created an infrastructure that will seamlessly operate across 3 orders of bandwidth magnitude 
(from lOMBit/s up to lOGBi/ts). 

• Fibre optic. The reselling and availability of so-called ‘dark’ fibre (i.e fibre optic connections 
where the customer provides the equipment at each end without an intervening provider 
dictating the connection protocol) has allowed the deployment of POS (Packet Over Sonet) 
interconnects at speeds ranging from OC-3 up to OC-192. 

• Wireless Technology. Higher and higher speeds are now available for wireless operation, and 
rapid development of new protocols for metropolitan area wireless networking and the 
emergence of devices that incorporate wireless technology will accelerate this market. 

In Australia, we are somewhat sheltered from the harsh and unforgiving world of high bandwidth 
availability to the average user, presumably because if we had high bandwidth, we wouldn’t know 
what to do with it. However, sometime in the future it may eventuate that ADSL or cable modem 
coverage will improve. 

Apart from the (promise or otherwise) higher bandwidth, the other pressure that is present is the 
integration and support of new features or protocols. The Internet has been a fertile proving ground 
for the development of new technology, and even though recent attempts have been made to 
increase the core robustness, there is still a rapid uptake of new features. 

This creates an ever-growing set of ‘core’ requirements that Internet devices must encompass to 
operate in the Internet sphere. Products that do not keep up are obsoleted. Some of these features 
are: 


• NAT (Network Address Translation). NAT (along with CIDR) has been one of the main 
reasons why the exhaustion of network addresses has been arrested. It also allows a clean 
separation between private network addressed domains and the public Internet. However, 
NAT can be expensive to do, since it requires every packet to be examined and the addresses 
to be modified - in some cases, even the TCP/UDP port numbers are translated. 
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• Security issues. With the recent Denial Of Service attacks on major Internet servers, it is clear 
that the Internet is not immune to vandalism, and in spite of the fact that such attacks may 
have a social issue at their source (and consequently may be best addressed in a legal or social 
forum), the technology of the Internet must show itself to be robust against such attacks. This 
may manifest itself as better firewall mechanisms (security access lists etc.), more responsive 
and intelligent intrustion detection methods, or ways of protecting servers themselves against 
resource exhaustion. 

• Quality of Service (QoS). QoS has been a hot topic for Internet research. The old philosophy 
of ‘best effort delivery’ may not be suitable in a world where some customers prefer to pay 
for ‘get it there or else’, and Differentiated Services allow for different delivery, billing and 
priority models. Protocols such as RSVP allow end-to-end reservation of bandwidth. 

• Integrated Voice/Video. A popular model of operation that has emerged is the integrated 
network, where a single IP based network carries pure data protocols (web, application data, 
remote sesssions etc.) as well as voice (VoIP) and video. Whilst much of VoIP applications 
rely on QoS for the effective operation, often the underlying transport devices must perform 
other services for these applications, such as finer grained fragmentation etc. 

• Application protocol acceleration. As new applications emerge, it is clear that new transport 
layer protocols often accompany these applications, and with these new protocols come new 
requirements for network devices. For example, even with HTTP, new classes of devices are 
appearing that perform load balancing or switching based on the URL data within the 
application protocol itself, and caching devices exist that accelerate the overall network by 
caching the data at a more convenient location. 

• New Protocols. New protocols emerge often as a result of new techniques or a better 
understanding of networks. E.g Multi-protocol Label Switching (MPLS) grew out of a desire 
to run IP more efficiently over networks like ATM, but have since been seen as an technology 
that can be applied to Virtual Private Networks (VPNs). 

• Future Protocols. IPv6 is still planned as the ‘next generation’ core protocol for the Internet, 
though it remains to be seen just when the large scale deployment of IPv6 will occur. In any 
case, it is likely that network devices will need to run IPv6 at some point in the future. 

The existence of these pressures on the development of network devices has produced an 
interesting challenge. Everybody wants the devices to run 10 (or even 100 times faster), but to do 
10 (or 100) times as much work! To put it in crude plumbing terms, it is as if we demanded from 
our water supplier as much water to fill our swimming pool in a couple of hours, but we also want 
separate pipes for hot water, cold water, spring water, and water with fertiliser in it for the lawn. 

Router Evolution 

To understand the environment and need for Network processors, it is useful to review the 
evolution of routers and network devices. 
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Early routers were simply general purpose embedded systems with network interfaces attached. The 
network interfaces would DMA network packets into a common memory, and the CPU would 
examine and process the packets, and then transmit the packet to the output interface. 

Whilst this style of router was very general purpose and flexible, the speed of network interfaces 
supported was limited to 

lower speed serial lines and LAN interfaces (1Mbps up to 45MBps). As CPU speeds increased, the 
amount of packet processing could increase, but the memory subsystem rapidly became a 
bottleneck. 



Later generations of routers created a better I/O architecture for packet processing, where some 
faster dedicated memory was used to hold packets in transit, and the CPU had a separate memory 
bank for code and data tables. Sometimes specialised ASICs were used to provide hardware assist 
(e.g filtering, compression, encryption etc.). The CPU was still involved in the forwarding of every 
packet, but the main memory bank was no longer the bottleneck. Much of the performance is 
dependant upon careful tuning of CPU access to the shared packet memory. One part of this was 
the need of the CPU to limit its view of the network packets to just the header, sometimes using a 
cached write-through view of the packet memory so that access to multiple header fields could be 
done efficiently. 

This architecture is typical of many routers available on the market today. 
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As the core Internet developed in performance requirements, and fibre optic interface speeds 
advanced, newer architectures evolved that employed central crossbar switch matrices fed by high 
speed line cards (as shown below). 


High speed crossbar fabric 
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This architecture allowed parallel processing of network packets, as well as providing redundancy 
of processing. Each line card may be a simple hardware line interface, or there may be a local CPU 
providing some intelligence, or a custom ASIC may be used to provide faster feature processing. 
The higher cost of these architectures meant that only core routers were implemented this way. The 
use of CPUs in these line cards meant that more features could be supported, but at a high 
performance cost because of the need to integrate the CPU into the packet path. 

However, as higher bandwidth options lowered in cost and became commonly available, faster 
processor was required more at the edge of the networks, but this was also where other more 
sophisticated features were applied (NAT, Security, QoS etc.). 

Why Network Processors? 

An interesting divergence has occurred in the last few years in the world of CPUs. Traditionally, 
CPU designers manufacturers have targeted CPUs at different markets, reflecting the cost or 
performance required. Typical Microprocessors were aimed at servers, workstations or PCs. The 
workloads expected of these CPUs were generally considered similar, though some systems were 
optimised for graphics performance (often through the use of dedicated co-processors). Much 
computer science study has centred around the architectural and performance tradeoffs of these 
CPUs, leading to the development of RISC CPUs and other high speed CPUs. A typical CPU these 
days is orientated around a high speed central core with a multi-level cache arrangement to reduce 
the performance hit of accessing slower main memory. The I/O requirements of processors is 
limited to devices that DMA into memory ready for processing by the CPU. Scaling of processing 
tasks by general purpose CPUs has been driven in two directions; increasing clock speed, and the 
use of multiple CPUs. Vendors such as Sun Microsystems have very successfully scaled the 
performance of the Sparc architecture by concentrating heavily on symmetric multiprocessing. 

Variants of these CPU were often produced by the designers aimed at particular markets, such as 
the embedded market. Usually, a different product cost/performance tradeoff was required, and 
typically with these embedded CPUs, a number of support devices are integrated with the CPU to 
reduce the overall number of external peripheral devices. These embedded CPUs were often used as 
devices in routers and switches, as well as a myriad of other devices. 
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An alternative approach to embedded CPUs and general purpose CPUs was the development of 
dedicated ASICs, designed specifically for packet network processing. Typically, these ASICs were 
proprietary chips, tightly coupled to a specific product’s architecture and design. One advantage of 
these ASICs is that the packet performance is considerably greater than a general purpose CPU, 
because the ASIC has fixed high speed logic replacing the general puipose instruction stream. This 
is, of course, the main disadvantage of dedicated ASICs, that the time to design and craft the final 
product can be as long as 12 months, and the result is inflexible; if new switching algorithms or 
protocols need to be supported, a whole new ASIC needs to be designed. 

The common feature of the embedded CPUs was that the CPU was still a general purpose CPU, 
albeit with extra support or integration making it attractive in that environment, and the design was 
orientated around the original general purpose workload. 

This workload is actually very disjoint from the optimal workload for devices performing high 
speed processing of network packets, and as routers evolved through the designs shown, it was 
becoming increasingly clear that general purpose CPUs were not suitable for more advanced 
processing of network packets, for the following reasons: 

• The memory architecture of general purpose CPUs essentially involve a heirarchy of memory 
starting at primary cache, secondary cache, DRAM, mass storage etc. The design of the 
memory architecture centres around the CPU having fast access to a large memory space, 
with cache designs maximising bus utilitisation. 

• So that packets can be processed easily by CPUs, the packets are usually DMA’ed into some 
fast memory that allows dual-ported access by network devices and the CPU. However, 
whilst this architecture suits the CPU, it does require that the network packet traverses the 
memory bus twice. Only by using very high speed SRAM can the faster interfaces be 
supported, and even then the size and cost limitations of SRAM means that only a limited 
amount of memory can be supported. 

• The cache architecture of general purpose CPUs do not fit the short-term processing of packet 
headers. 

• The memory bandwidth of general purpose CPUs is not great enough to provide high speed 
processing of network packets without suffering memory latencies and delays that effectively 
serialise and slow the processing of packets. 

• It would be useful to have dedicated instructions for certain processing of network packets 
(fletcher checksum etc.). 

• Integration of hardware assist is lacking (CAMs etc.). 

• The I/O architecture of general purpose CPUs does not fit the flow of network packets. 

• A higher work-per-cycle ratio is often needed for network processing, so that high speed 
interfaces (OC-3, OC-12, OC-48, GE) can be supported. 

• Network processing does not lend itself to symmetric multiprocessors, mainly because the 
memory bandwidth for common data structures is still a bottleneck. 

These requirements have spawned a separate class of processor termed Network or Communication 
Processors, which are CPUs designed and architected specifically to meet the needs of high speed 
data communications packet processing. 

Cisco has developed its own breed of Network Processor, which is officially termed PXF (Packet 
eXpress Forwarder), but is known unofficially as Toaster. 
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Toaster is a programmable packet switching ASIC consisting of an embedded array of cpu cores 
and several external memory interfaces. The chip may be programmed to partition packet 
processing as one very long pipeline, or into several short pipelines operating in parallel. It is 
designed primarily to process IP packets at very high rates using existing forwarding algorithms, 
though it may also be programmed to perform other tasks and protocols. 

Toaster is composed of an array of 16 CPUs, arranged as 4 rows and columns. The core CPUs are a 
cisco designed CPU optimised for packet processing. A key aspect of toaster is that it is highly 
programmable, i.e it is not a dedicated ASIC with fixed set of functions or features that cannot be 

extended. 

In a purely parallel multiprocessor chip, each cpu core needs shared or private access to instruction 
memory for the complete forwarding code. This was ruled out both because it was an inefficient use 
of precious internal memory, and because it would be difficult to efficiently schedule external data 
accesses with so many processors running at different places in the code path. An alternative is to 
lay out the datapath into a very long pipeline; this conserves internal code space, since each 
processor executes only a small stage of the packet switching algorithm. One drawback of this 
approach is that it is difficult to break the code up into 16 different stages of equivalent duration. 
Another problem with the very long pipeline is the overhead incurred in transferring context fiom 
one processor to the next in a high bandwidth application. 

Toaster’s multiprocessor strategy is to aim at a configurable sweet spot between fully parallel and 
fully pipelined. The normal Toaster mode has all processors in a row operating as a pipeline, while 
all processors in a column operate in parallel with a shifted phase. Packets that enter Toaster are 
multiplexed into the first available processor row. In this mode, packets work their way across the 
pipeline synchronously at one fourth the rate that packets enter the chip. When a packet reaches the 
end of a row, it may exit the chip and/or pass back around to the first processor of the next logical 
row. This facilitates packet replication for applications such as multicast and fragmentation, as well 
as enabling the logical pipeline to extend for more than four cpu stages. 
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Each column of CPUs share the same instructions, downloaded by a supporting embedded general 
purpose CPU (which also manages the housekeeping functions, boots the system etc). Each column 
supports a 32 bit memory interface which can be either SDRAM (up to 256Mb) or SRAM. A small 
amount of on-chip shared internal column memory exists, and each CPU has a 128 byte local 
memory block. 

The current generation of toaster is implemented in ,20um technology with a 1.8V core, operating 
at a system clock speed of 100MHz. 

Dataflow Concept 

Toaster is fundamentally different from general purpose CPUs, because it is based on a packet 
dataflow model where the packet data passes through the ASIC rather than the typical centralised 
CPU model where the CPU fetches the data from external memory. Apart from the 4 column 
memory interfaces, two separate 64 bit wide high speed interfaces provide the input and output 
paths of the packet data; these two interfaces are complementary so that the output of one toaster 
ASIC can be joined to the input of another to provide a deeper pipeline for more sophisticated 
packet processing. The interfaces can operate at full system clock speed for a maximum throughput 
of 6.4Gbps. 

As an analogy, one of the most significant manufacturing breakthroughs of the 20th century came 
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with the invention of the asembly line in the Ford Motor Company. The concept was simple. 
Previously, a car was built by laying the chassis out on a factory floor, and then workers would 
bring parts and assemble the vehicle in the same spot. This complicated the manufacturing process, 
because limited workers could operate on the vehicle, and parts stocking and supply was an issue. 
The assembly line revolutionised this process by placing the car on a moving assembly line that 
allowed specialised workers access to the vehicle at the appropriate time, simplifying the parts 
supply and access. When more automation was applied to manufacturing, this allowed a faster and 
more efficient processing of the assembly line. In terms of packet processing, toaster is the 
equivalent of an assembly line, where the packets move through toaster, having dedicated CPU 
resources applied to the packets according to the desired functionality. Rather than operating with 
primary caches dedicated to holding much used data, toasters’ CPUs have high speed access to the 
packet data itself, inverting the memory latencies normally suffered when using general purpose 
CPUs with network packet processing. Each packet header is passed through toaster as a 128 byte 
context. Copying of this context down the row automatically occurs as a hardware background 
operation while the CPU is operating on the packet data, removing any overhead of transferring the 
packet data to the next CPU in the pipeline. 

Core CPU Details 

The toaster CPU design is highly optimised for packet processing, with the following features: 

• Dual instruction decode and ALUs to allow two instruction issues per clock cycle 

• 64 bit long instruction words allowing two general purpose instructions (one to each ALU) as 
well as separate micro-ops for branch control, memory prefetch operations and other control 
instructions. 

• Specialised instructions for packet processing, such as hash instructions, checksum 
processing, atomic indirect memory operations for queueing and statistics etc. 

• 14 32-bit general purpose registers and 2 special registers. 

• 16 bit instruction address space, 32 bit data address space. 

• Support for 8, 16, 32 and 64-bit data types. 

• Multi-way conditional branching. 

• Compound-function ALU that provides combined shift and mask with arithmetic operations. 

• High performance memory interface, dedicated instruction bus plus two data interfaces to 
support simultaneous memory fetch and store operations. 

One interesting aspect of the toaster core CPU design is the memory subsystem. Prefetch micro-ops 
can be used to prefetch memory values so that maximum use can be made of the dead cycles 
normally caused by memory latency delays. These memory operations can be scheduled so that 
maximum memory bandwidth is obtained (often important, since 4 CPUs in a column share the 
same column memory interface). 

Software Considerations 

Because of the uncompromising performance requirements, developing software for toaster is 
essentially a microcoding problem, because each CPU instruction allows up to 2 general purpose 
instructions and 3 micro-ops. To get the most out of toaster, it is key to write and develop efficient 
microcode. One of the side-effects of the performance requirements is that much of the machine 
architecture is exposed to the programmer - for better or worse. Some of the more exciting 
challenges that toaster presents for the average software engineer are: 
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• Using the dual issue instructions to maximise the work done for each cycle. 

• Use memory prefetching so that work can be achieved in the cycles while values are being 
fetched from memory. 

• One cycle write delays to the register file mean that when a value is transferred to a register, 
the value is not seen until one cycle after that instruction. To alleviate this, special bypass 
registers can be accessed to retrieve the previous results of either of the ALUs. The cycle 
delay means that bizarre code can be written to access the old value of the register that is still 
present in the instruction after the instruction where the new value is written! 

• Similar to most RISC CPUs, toaster has a branch delay slot where the instruction after a 
branch is fetched and executed. Unlike RISC CPUs, however, a micro-op qualifier can 
optionally cancel the delay slot instruction if a branch is taken. 

Results 

With the use of the background context data mover, a minimum of 64 CPU cycles can be applied to 
every packet header for each CPU in toaster. This provides a maximum processing rate of 6 Million 
packets per second. At this rate, some 512 CPU instructions can be applied to every network packet. 
A great deal can be done in those cycles, such as NAT processing, access list security filtering, IP 
routing, quality of service shaping and policing etc. This is approximately twice as fast as any other 
Network Processor currently available in the market. 

The programmability of toaster has shown itself to be a significant advantage over dedicated 
ASICs, yet not at the expense of performance, so that new algorithms and improvements can be 
delivered without any hardware changes. This is critical, especially as Internet years seem to grow 
shorter all the time. 

The first product from Cisco incorporating toaster was announced and shipped in March of this 
year (C7200-NSE) and it is expected that toaster will become a significant building block in the 
delivery of products that allow the Internet to continue to grow and develop at the rate seen so far. 
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How Many Penguins Does It Take to Sell a House? 

Stephen Hodgman 

EXTENDED ABSTRACT 

The Independent Property Group is a local Canberra company that relies heavily on Linux and Open 
Source Software in its daily operations. This is not something that the management really concerned about. 
They are pleased with the results that we have been able to provide them over the years. How we did it is 
our business. What they like is that the network and application decisions that have been made have 
demonstrably saved many thousands of dollars in licences, running costs and support. 

Our company, Namadgi Systems, has been working with the Independent Property Group (IPG) since 
1991. We developed a business application hosted on SCO Unix. Each of the five suburban offices had 
'’dumb" terminals linked to head office with multiplexers using dialup modems. 

Progressively the corporate network and applications have expanded and developed. Currently each subur¬ 
ban office has a Linux server that connects to the head office via dialup PPP. The suburban office has an 
ethernet LAN a variety of printers and PC’s. Each office runs as a distinct subnet. The suburban Linux 
server provides DHCP, IMAP mail, SMTP host, Web Server (Apache), samba, server, rsync and printing 
services plus basic IP (internal) DNS and routing. 

In head office, the SCO Unix server still hosts the core business application. This could be replaced but is 
not seen as a problem and "we dont fix what isn’t broken". In addition, there are two Linux servers in¬ 
stalled providing other services. 

The first is an internet gateway Linux server. This provides firewalling and email services tor all incoming 
and outgoing email. It serves the external DNS as primary nameserver for the external domain, while using 
a "split DNS" to allow for internal address lookups. 

The second server has grown to provide a number of functions. These include Master Internal Web Server 
(Apache), IMAP/POP mail host, SAMBA server including network login server, dosemu server hosting a 
multi-user clipper based application across the network, network dialup hub and router, DHCP, rsync and 
printing. 

That completes(?) a cursory overview of the systems and software installed. The important thing though is 
why is all of this "Open Source Stuff" such a good thing? From my point of view the major difference be¬ 
tween this and a corresponding MS based network are: 

1. Ours is reliable. Servers regularly go unattended for months. 

2. Support. This network is easily supported. The flexibility has allowed us to have most functions se¬ 
mi-automated. 

3. Costs. There are no licence fees on the base infrastructure. Data link costs are almost nil 
Lets look at a few of the details and challenges: 

One of the critical requirements to be met by this installation was catering for the mobility of users. People 
regularly move between offices. We also have a mix of users with fixed workstations, notebook computers, 
and dumb terminals. This network is maintained day-to-day by the company secretary. All user, creation 
and maintenance is done by him using a simple interface. All logins are the same across the network. This 
means that a user can log in from any office and get the same access priviledges and services as they would 
from their desktop. 

We have been able to reduce the communications costs by using dialup lines rather than leased circuits. 
Rsync allows us to mirror static data across the network servers. Suburban offices use telnet to access the 
business applications. The core application was originally a telnet system. This worked well on the dialup 
links. However, when the property management section decided that they wanted the surburban offices to 
access a DOS based multi- user networked application we were able to demonstrate that faster data links 
would be required. DOS networking across dialup PPP links was going to be a killer! 
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Enter Dosemu. By installing this on the head office server we could take advantage of the efficiency of tel¬ 
net and use a standard DOS package. We terminal emulators on the workstations that provided a cus¬ 
tomised keyboard translation. This allowed all of the "PC keystrokes" to be emulated on the workstation 
under Windows. 

We are currently investigating options for faster links. Interestingly, the network design has meant that the 
links cope most of the time. However, usage is growing and we need increased throughput. ISDN seems 
expensive to us so we are looking at multiple dialup links from each office. The details of supporting this 
in Linux PPP are being investigated at present. 


Brief Biography 

Namadgi Systems (formerly Adept Software) has been a corporate AUUG member since ’91. Our compa¬ 
ny hosted the first AUUG dialup email server in Canberra. 

Stephen Hodgman BE BSc is a company director. Worked in the industry for over 20 years. Have installed 
and developed many systems based SCO Unix and Linux servers. We maintain around 15 Linux servers in 
Canberra. 
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Systems Integration using Open Systems 
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ABSTRACT 

The failure of a University wide Netware authentication system for students 
required a rapid solution. Several members of the Business School’s 
Information Technology Services section combined their knowledge and 
skills to produce an authentication system for the Windows NT/Novell 
network using Open Systems tools. 


1. Introduction 

The Curtin Business School (CBS) operates a LAN consisting of Microsoft NT 
Workstation clients and Novell Netware servers. Windows NT was introduced in 1997 
replacing Windows for Workgroups clients. Prior to the introduction of NT, no individual 
authentication was performed. Students were logged into generic accounts. To permit 
students to access disk storage on a server, individual authentication was required. 

The University Computing services provided a system. Student Electronic Services (SES), 
which permitted students to generate a Novell Netware Directory Services (NDS) account 
based on data downloaded from the VMS based student records system. SES had 
performed adequately in smaller departments in the university, but failed to scale to the 
requirements of CBS. CBS had 12,000 students able to access the network from 250 PCs. 
All 12,000 student accounts were placed in one NDS container object. The authentication 
server was unable to cope with the load generated. Student logins were taking 8-45 
minutes and password changes were taking greater than an hour to process. Due to a 
shortage of manpower and resources, no short-term solution was available from computing 
services. 

The situation was badly impacting our core teaching services. CBS has a good reputation 
for providing access to computing services, and this was threatened by the unavailability of 
labs. 

Returning to group logins was considered, but this would result in the loss of individual 
disk storage and difficulties with printing. CBS Information Technology Services (ITS) 
developed a hybrid solution utilizing Novell Zenworks, WinBatch, Oracle and 
UNIX/SAMBA to permit timely, authenticated access to the network. 
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2. The Solution 

At the end of semester, it was decided that continuing with SES was untenable. If no 
authentication solution was found, a return to group logins would occur. There was a four- 
week window of opportunity between semesters to find a solution. CBS had a considerable 
investment in the Novell environment. A solution to the problem needed to preserve this 
investment. No additional funds were available to commit to the solution. A team of three 
members of CBS Information Technology Services was tasked with developing a solution. 

After some discussion of possible solutions, the team devised a strategy that would 
preserve the Novell environment while permitting access to online storage. The solution 
was devised in a rapid development mode, with little time for formal design. It consisted 
of using available software with small sets of code to link them. Two COMPAQ servers 
retired from Netware use were resurrected as platforms for development. The existing 
Netware servers would continue to provide access to online applications and printing via a 
generic logon. 


2.1 Oracle 

One of the COMPAQ servers was loaded with NT 4.0 and Oracle. The reason for choosing 
NT 4.0 over Solaris was financial. Curtin has a site license for Oracle on NT/Intel, not for 
Solaris/Intel. The stability of NT 4.0 is a concern and we plan to move the Oracle database 
to Solaris/SPARC as soon as the budget permits. 

An Oracle database provides a repository for students enrolled in CBS units. Oracle was 
chosen as the authentication database as i) there was in-house expertise in Oracle and ii) it 
could easily handle the projected number of transactions for students logging in at the 
beginning of a class. 

Authentication data is held in two tables. A permanent table holds the data for staff 
members and any corrections required for students. Each morning, the contents of the 
permanent table are copied into an empty copy of the daily table. Student enrolment data is 
obtained daily as a CSV file via ftp from the student records system. A script using the 
Oracle SQL Loader facility loads the data from the CSV file into the daily table. If a 
student record exists in the daily table copied from the permanent table, the loader will not 
overwrite it. 

The daily update ensures that if a student is no longer enrolled in a CBS unit, they will no 
longer be in the authentication database. Once the daily table has been created, it is used to 
produce a CSV dump file of student details for transfer to the SAMBA server. 


2.2 SAMBA 

The second COMPAQ server was loaded with Solaris. SAMBA was installed to provide 
home directories for students. Each student has an account with their Student ID number 
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as the username. Each account has a 5Mb quota maintained by Solaris. One member of the 
team had been experimenting with SAMBA prior to the crisis with SES. SAMBA could 
provide the individual home directories, but this would require giving students access to the 
Windows NT NET USE commands, a considerable security risk. This was avoided by the 
use of internal drive mapping features of WinBatch (see 2.3). 

Each morning, a copy of the CSV dump file from the Oracle authentication database is 
obtained via ftp. A PERL script processes the CSV file, comparing the contents with 
/etc/passwd. 

• If the usemame exists in /etc/passwd, no action is taken 

• If the usemame does not exist in /etc/passwd, an account is created by piping 
the student details to useradd. The shell for the account is set to /bin/ false 
to prevent interactive logins via telnet. An expect script is then used to set the 
account’s password to the PIN field from the CSV file. 

• If the usemame exists in /etc/passwd, but not in the CSV file then the contents 
of the accounts home directory is checked. If files exist, an archive is produced 
using tar and gzip. The account is then deleted using userdel 

• Logs of all transactions (add/delete) are generated by the PERL script and stored for 
1 month 

4.3 NT Workstation Login 

All student PC’s run Windows NT Workstation 4.0, which has been configured to enforce 
a strict security policy. Initially, students login to a generic Netware/NDS account. The 
generic account runs a Netware script, which launches a WinBatch script. WinBatch 
provides scripting facilities to the GUI environment of Windows and has a large number of 
functions to provide access to the Windows environment. 

The WinBatch script displays a dialog box, requesting a Student ID number and PIN. The 
script opens an ODBC connection to the Oracle authentication database. The Student ID 
number and PIN are used to lookup the daily table. If the ID is not found or the PIN 
number does not match a 0 is returned and the script logs the workstation out of the generic 
Novell account. If a correct ID/PIN combination is found a 1 is returned. The script then 
uses a WinBatch feature to map the H: drive to the student’s home drive on the SAMBA 
server. 

Novell Zenworks is launched to provide access to applications located on the Novell 
server. Netware is used to provide a scratch volume and printing services. 


3. Operation 

The system was assembled in a rapid prototyping mode, with one team member working 
on each component (Oracle, SAMBA, WinBatch). As portions of each system were 
completed, integration tests were done against the others. The system was in place for the 
beginning of semester. 
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The first load of the student details of approximately 12, 000 into the Oracle database took 
over 24 hours. Subsequent daily loads required less than an hour as most of the students 
were already in the database. 

Authentication between the WinBatch script and the Oracle database took less than 10 
seconds, a vast improvement on the previous system. 

The system ran for 18 months without any major fault or alteration. 

4. Enhancements 

Using UNIX as the operating system for the server containing the student home directories 
permitted a number of quick enhancements to provide additional services for students. 

4.1 ftp access 

Under Novell, students had only been able to access their home directories while in the 
laboratories. Access to the home directories via ftp was enabled, permitting access from 
other locations and via the Internet. 

4.2 Student web pages 

The Apache web server was installed on the SAMBA server permitting students to place 
web pages in a WWW subdirectory of their home directory. 

4.3 Electronic Mail 

After six months of operation of the original system, an addition COMPAQ server became 
available to provide reliable student email access. The server loaded with Solaris used 
sendmail and a POP daemon to provide students access to email. Accounts are 
generated using the same dump file as the SAMBA server. The PERL script from the 
SAMBA server was modified to either produce accounts for home directories if run on the 
SAMBA server or to provide access to their mail via POP if run on the mail server. There 
had been a Novell/Mercury mail solution available via SES, but it suffered from the poor 
performance as documented above. 

Purchasing the SIMS web-based mail system from the SUN/Netscape alliance has recently 
further enhanced the e-mail system. This replaces the POP access with IMAP and provides 
Web access to email. 

4.4 Dynamic home directory creation 

When the system was first developed, the PERL script on the SAMBA server created a 
home directory for all students in the imported CSV file. An examination of the access 
logs for this server revealed that there were a significant number of students who never 
logged onto the system. An option in a later version of SAMBA, root preexec, 
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allowed a script to run when a user connected to the SAMBA server. This is now used to 
create the home directory when the user first uses their account. 

In the initial version of the system, all home directories were created directly under 
/home. Home directories are now created under /home/0 ... /home/9, depending on 
the last digit of the student ID number. This was done for performance reasons and to 
assist in the backup process. 


5. Criticisms 

The system was constructed rapidly to solve an acute problem. This did not permit the 
team time to reflect on the solution. Most of the criticisms are of a security nature. 

5.1 One size fits all 

In this solution, all students receive identical accounts. It is possible to create groups on 
the Solaris system and grant extra storage, but as all students login to the one Novell 
account, they all have the same privileges on the NT workstation. 

5.2 PIN Numbers 

The initial PIN number is the student’s date of birth. This was intended to be replaced by a 
university standard PIN number used to access online students services. Through 
administrative problems, this became unavailable in the student records dump file. The 
PIN number remained the student’s date of birth. With the login ID being the student’s ID 
number, it was possible for peers with personal knowledge of an individual to gain access 
to home directories, web pages and email. This will be remedied when the University’s 
new student records system permits students to alter their own PIN number. 

5.3 ftp of Student records 

Student record CSV files were retrieved from the VMS system using standard ftp. If a 
sniffer program had access to the cable system in question, it could get the clear-text 
username and password. This could jeopardize the security of all student accounts. This 
was also true of the transfer of the dump files from the Oracle server to the SAMBA server 
and the Mail server. File transfer using the facilities of ssh have been implemented to 
reduce this risk. 

5.4 ODBC 

The ODBC connection between the NT workstation and the Oracle server is not encrypted. 
This leaves the possibility of the LAN being sniffed to obtain a user ID and PIN. The 
student laboratories all use fully switched Ethernet, which reduces the risk. The use of 
secure sockets was considered, but was not a possibility from WinBatch. 


6.0 Evaluation 
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The system was developed very quickly to solve an urgent problem. The success is due to 
the philosophy of using proven components to solve individual tasks and the availability of 
Open Systems tools to provide the glue to assemble those components into a functioning 
system 

7.0 The Future 

The system is under constant review and changes in the environment will lead to 
enhancement or replacement. The WinBatch script has recently been replace by a Delphi 
program to permit further enhancement. 

CBS will be migrating the NT 4.0 Workstations to Windows 2000 in the next 12 months. 
Investigation is under way for alternate authentication mechanisms. 

Planning is underway to replace the Intel based COMPAQ servers with SPARC servers, 
including the move of the Oracle system from NT to Solaris 
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Much software is now available under Open Source licences, which apply to the public 
availability of source code. There are many Open Source licenses, with a variety of 
conditions to cover different circumstances. However, there are many other types of 
information covered by copyright, and there are Open Content licenses for these. This 
paper discusses these licenses 

1. Introduction 

The Open Source movement has crystallised a number of ideas about the public availability of 
source code that have been practised both formally and informally for a number of years [1]. The 
GNU Public License has represented one extreme of this, as it guarantees that not only must the 
source covered by this license be public, but all derivative works must be covered by this license 
and also be public. There are other Open Source licenses such as the Berkeley license, which 
simply make the source code available, with no restrictions on further use. 

Source code is “creative content” that is subject to copyright, and so may have an open copyright 
license applied to it. The concept of copyright covers a much wider range of material, such as 
images, novels, computer documentation and various types of performances. Licenses such as the 
GNU Public License are not always applicable to such material, and a number of different licenses 
have been proposed to deal with these in an “open” manner. 

Of particular importance to the computing community is documentation material for computer 
systems, or indeed, any written material. Some of the “open” licenses refer specifically to written 
materials. 

This paper surveys some of the different Open Content licenses, which can cover copyright 
materials other than computer source code. 

2. Copyright 

Copyright law is a complex area, which varies across countries. In Australia, copyright may be 
applied to literary works (including computer program source), dramatic works, musical works, 
artistic works, films, sound recordings, broadcasts, published editions and certain types of 
performances [3]. This is much wider than the simple domain of computer source code. 

There is a wider category, too: that of Intellectual Property. This includes patents, copyright, trade 
marks, designs, trade secrets and many other areas. There seems to be little attempt as yet to apply 
Open Content principles to these, although some areas of this (in particular, computer patents) 
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would seem to be worthy of attention. 

3. Open Source Licenses 

There are a large number of licenses that can be branded “open”. The OpenSource Web site 
contains definitions of Open Source, and also contains a list of licenses which conform to this 
definition [2]. The definition includes 

• free redistribution of software 

• source code availability 

• allow derived works 

The best known licenses probably are the Berkeley license, the GNU licenses and the Perl artistic 
license, all of which may be found on the OpenSource site. 

3.1 Berkeley License 

The Berkeley license was originally devised for the BSD Unix distributions. It changed over the 
years, following some court actions. 

It is a very simple license, that allows redistribution in source or binary form, as long as the 
copyright notice is retained. It also contains a disclaimer of liability which looks sufficiently 
legalistic to have been framed by lawyers and perhaps to have good legal meaning. It does not place 
any requirements on derived software. 

3.2 Perl Artistic License 

The Perl Artistic License is one of the possible licenses for Perl distributions, or for perl packages. 

It is fairly flexible in that it allows choices on distribution and modification. 

If a package is modified so that it changes from a “Standard Version”, then these changes must be 
posted in some way 

• the changes can be posted on Usenet or similar; or 

• the new executables must be renamed to distinguish them from a Standard Version, and there 
must be man pages explaining the differences; or 

• the package can be used privately, so that the changes need not be made public; or 

• other distribution arrangements can be made with the original author 

Distribution of the software can be done in a variety of ways, too 

• a binary distribution can be done, along with instructions of how to get a Standard Version in 
source code form 

• source code can be distributed freely 

• other distribution arrangements can be made with the original author 

In all of these, the original copyright notices must be duplicated in the distribution. 

You can charge a reasonable copying fee and charge for support of the package. But you cannot 
charge a fee for the package itself. But if you use the package to create programs, then you own the 


222 


Open Content Licenses 


AUUG2K - Enterprise Security, Enterprise Linux 


copyright on these programs, and are allowed to sell them commercially, or give them away - they 
are yours. 

3.3 GNU General Public License 

The GNU General Public License (GPL) comes from the Free Software Foundation (FSF), which is 
responsible for a huge range of software tools such as emacs, gcc, Unix utilities, 
compiler-compilers, and so on. The GPL is possibly the best known Open Source license. Since it is 
unlikely that Linux would ever have been produced without use of the GNU tools, the FSF claims 
that Linux should be properly known as GNU/Linux. 

The GPL is a very purist license, which works from the philosophical basis that software should be 
free, and that once software is free the license should protect this freedom. Note that “freedom” 
refers to freedom of access to source code, not free as in costing nothing. A distributor of software 
covered by the GPL is free to charge a reasonable distribution fee, to charge for support and to 
charge for instructional material about the software. If this was not the case. Red Hat and other 
Linux vendors would not be able to charge for their distributions, and O’Reilly would be unable to 
sell books about GNU tools. 

The GPL covers the activities of copying, distribution and modification. Copying and distribution 
requires you to maintain and publish the copyright notices on the software. Like other licenses, this 
ensures that authorship can never be removed, so that the intellectual property ownership remains 
with the originators. If the software is modified, then it may be redistributed in source or binary 
form. However, even if it is in binary form, there must be reasonable access to both the original and 
to your changes. In other words, you cannot make changes to the software and hide them if you 
make the executables available to others. 

There is often confusion about products created using software covered by GPL. The most common 
case is a program in C/C++ that is compiled by a GNU compiler. The result of running the compiler 
is not covered by the GPL. That is, the executable produced from your source code belongs to you , 
and is not covered by the GPL. You are free to place whatever copyright license you want on your 
executables. You do not have to make your intellectual property publically available, although the 
open community would of course prefer you to do so. 

3.4 GNU Library Public License 

The GNU Lesser Publication License (formerly the Library Public License) is known as the LGPL. 
The LGPL is a weaker form of the GPL, and is mainly intended for use with libraries of code. If a 
library is covered by the GPL and is linked into a program, then that program is considered to be a 
derivative of the library and so it must be covered by the GPL. From a purist viewpoint this is good: 
it forces everyone into making their software GPL’ed if they use your library. 

However, purism can have the effect of killing use of the library. For example, the GNU libc is 
linked into every C program. If this was under the GPL, then every program compiled and linked 
with GNU libc would have to be GPL’ed. No commercial vendor would be likely to use the library. 

The LGPL thus distinguishes between distribution and modification of the library itself, and 
distribution of programs which just “use” the library. Modifications to the library are dealt with in 
a similar manner to the GPL, but programs linked with the library are called “a work that uses the 
library” and are not covered by the LGPL. That is, they can be sold, given away, and can be 
covered by the GPL or any other license. 
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3.5 Sun Community Source License 

the Sun Community Spurce License is not an open license [4], It represents an attempt to combiine 
elements of the Open Source licensing with “commercial reality”. It allows you to distribute, 
modify and use software as long as you make no commercial gain directly from the software. As 
soon as you do - and this includes using it as a library - then you have to start paying fees to the 
copyright owner. 

At present this license mainly refers to various Java libraries. Nobody yet knows if this is helping 
(source code availability) or hindering (pay later) software development using these libraries. 

4. Open Content License 

The Open Content License (confusingly called the OPL) is intended to cover any copyrightable 
material that is digitally available [5]. It covers digital images, documentation, educational 
materials, digital videos, etc. Essentially, it is based on the GNU GPL, with specific reference to 
software removed and replaced by “content”, or simply eliminated. 

The Open Content License allows free use of content, while maintaining copyright ownership by 
the author(s). Modifications are allowed, but if you redistribute the changed version, then the nature 
of the changes must be made public. The license does not make clear how this is to be done. For 
example, for documentation a list of changes can be added, or change-bars inserted in the 
document, etc. However, if an image is modified, then it may be necessary to add the information in 
as metadata. This can be done for some file formats such as PNP, but not for all. 

5. Open Document Licenses 
5.1 Open Publication License 

The Open Publication License [5] was derived from the Open Content License after a sufficient 
number of authors began publishing works under this license, and the more traditional book 
publishers (such as O’Reilly!) found problems with this license. The Open Publication License has 
no formal acronym - while it should be called the OPL, this has already been used by the Open 
Content License. 

The Open Publication License covers documents, such as online tutorials, books, letters and 
program documentation. Since it deals with text documents, it is able to give a definition of what 
comprises a “modified work”. This includes translations, anthologies, compilations and extracts of 
the document. This differs from the meaning of “modified/derived work” for software. 

A major difference between this license and others lies in two optional restrictions that may be 
added by authors. The first is to ban “substantial modifications” without explicit permission. This 
is true for “traditional” copyright notices as found in books. Since the Open Publication License 
allows modifications, this brings it back in line with these notices, one could imagine Shakepeare 
(or any author) using this to forbid the Reader’s Digest condensed versions of their work! 

The other optional rider is to forbid publication in paper format without explicit permission from 
the copyright holder. This gives traditional print publishers the confidence they need to print books 
without fear that someone else can simply take their editorial and publication work and use it for 
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free. 

5.2 Linux Documentation Project 

The Linux Documentation Project [6] is an attempt to document all features of Linux, especially for 
configuration issues for various pieces of hardware and software. This involves “HOWTO’s” and 
“Guides”, along with manual pages, info docs, etc. Collections of these are often made into books 
such as the “Linux Bible” or “Linux Encyclopedia”. 

The existing HOWTO’s have been covered by a variety of licenses. For example, the 1996 version 
of the Linux Encyclopedia contains these copyright notices 

• Licenses based on the GNU General Public License 

• Adhoc licenses including statements such as “if you make money with it, the authors want a 
share” 

• No copyright claim at all 

Due to this variation, the Linux Documentation Project currently includes a “boilerplate” license 
that can be used as-is, or modified as desired. This allows anyone to freely copy or distribute (sell 
or give away) the document in any format. The document can be modified or a derivative work 
created, as long as the derivative uses the same license or the GNU GPL. The derivative must also 
be made available on the Internet in some manner, such as by sending it to the Linux 
Documentation Project. 

This license ensures that derivative works are also Open Documentation works, and that changes 
must be made public. It does not include waivers of liability, and may be problematic in litigous 
countries. In the future, it may change the reference to the GNU GPL to the GNU Free 
Documentation License. 

5.3 GNU Free Documentation License 

The GNU Free Documentation License preserves the spirit of the GNU GPL, adapting the details to 
open documentation. It preserve the rights to copy the material, to distribute it freely and to modify 
it. A key distinction made for documents is the difference between “transparent” and “opaque” 
documents. This distinction is made for purposes of modification. 

Transparent 

A transparent copy of a doument is a machine-readable copy that is in some sense in an 
“open” format that can be edited by generic text editors. For example, Latex, plain ASCII, 
simple HTML and XML or SGML where the DTD is publically available. It also includes 
images which can be edited using generic paint programs. 

Opaque 

An opaque copy of a document cannot easily be edited in this way. It includes Postscript 
(technically possible, but practically not), HTML produced by many HTML editors, 
proprietary file formats and formats which have been deliberately “mangled”. 

A document can be distributed or modified and the version passed along in transparent form. If it 
has been made into an opaque form, then it must be possible to get the transparent form, either from 
the network or by including the transparent copy. 
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Some material in a document should not be changed: such as the background of the authors, legal 
disclaimers, etc. These can be marked as “invariant” matter and cannot be modified. 

A document may have “cover texts” either front or back. If the document is printed in bulk, then 
these covers must form part of each printed copy. This allows an author to specify a cover for 
non-trivial print-runs. 

6. Personal Experiences 

I have authored two books, published by international publishers. The first was an awful 
experience, with the publisher leading changes based on a reviewer who really wanted to write a 
different book, and saw my book as means of getting someone else to do it. The publisher enforced 
these changes until this reviewer dropped out. The current version of the book, with the original 
book proposal was sent to a new reviewer, who suggested I throw out all the changes and revert to 
the original. At which point I became upset (!-) and insisted on publication as it was then. 

The second publisher was much more reasonable (Addison-Wesley), and gave me lots of valuable 
and timely advice. However, since I had approached them, they still maintained control of the 
project, and I had to make some changes against my preferences. On the whole, their changes were 
good, and I would not mind working with that particular group of staff at Addison-Wesley again. 

I did not make money out of these two books. The first sold 2,000 copies, the second sold 8,000 
copies. The first was in a relatively obscure area (parallel Prolog) and was mainly of academic 
interest. The second was on X-Windows and Motif, and could have sold better if it was promoted 
more by the publisher. These two experiences led to the conclusion that if I wanted to write books, I 
should do it for my own reasons: fun and interest, rather than to make money. 

At the beginning of 1999,1 knew nothing about Jini, and neither did anyone else. There were a 
couple of short tutorials and examples, and a difficult-to-read formal specification [8], So I began to 
write a book about it: as I discovered things, I wrote another chapter. The traditional way is to do all 
this in your closet, late at night when no-one is watching. After my previous experiences, I went the 
Open Content way, and published the material on the Web from the beginning, using the Open 
Content license [9], It started at 10 pages, and is currently at 350 pages. 

Open publishing has had many benefits. I have had feedback from the beginning, which still 
continues. Links to my tutorial/book have been made from many sites, including the Jini FAQ [10]. 
Sites which rate Jini materials place my site in the top ten. Eventually, a publisher approached me, 
offering to publish the book in hard-copy. Negotiations took place - but in this I had the 
upper-hand, since I was quite happy to continue working without a publisher, whereas he needed 
authors to survive. My concession was to change the license to the Open Publication license, with 
that publisher having sole rights to printed copies. 

I have no idea if I will make money from this book. It doesn’t matter, since I have regular 
employment, and have used this as a base to contribute Open Source for many years. This has just 
changed to an Open Content project, which is just as much fun, and avoids the hassles of dealing 
with regular publication systems which tend to make it hard for authors. 

7. Conclusion 

This paper has discussed some of the ways on which the original Open Source licenses have been 
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extended to deal with other types of Open Content. Even for particular types of content, there are 
often many different licenses, based on different philosophical backgrounds. This allows an Open 
Content author to choose a license that suits their requirements. 
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Performance Monitoring of Distributed MPI 

Applications 


by Ken McDonell, Performance Tools Group, SGI 

The MPI (Message Passing Interface) provides a paradigm for the development of 
parallel codes, where the algorithms and data structures are suited to coarse-grain 
synchronization. MPI-based applications are often complex and difficult to optimize, 
especially when multiple threads of the computation are distributed across more than 
one host. As arrays and clusters with large processor counts become more widely 
available, tuning MPI applications with high degrees of parallelism is an increasingly 
daunting task. 

In part due to the availability of open source MPI implementations and the open 
platform clustering efforts such as Beowulf, MPI applications are often to be found 

executing on clusters of Linux® systems. 

Performance Co-Pilot (PCP) is designed for systems-level performance monitoring and 
performance management. The PCP product(s) have been particularly successful 
amongst SGI’s customers with large and complex IRIX deployments. In December 
1999, SGI released much of the PCP infrastructure and services as open source. 

By combining the "wrapper" architecture of many MPI library implementations with the 
open sourced APIs of PCP, we’ve been able develop a methodology for instrumenting 
both the call frequencies and the distribution of time between the application and 
individual routines of the MPI library. Once the MPI performance data is being 
collected within each thread (or rank), an efficient shared memory transport is used to 
export the information to a PCP collection agent, and then the PCP protocols are used to 
move the data from one or more nodes in the cluster to a monitoring application. The 
technology has be used to build a sophisticated set of performance monitoring services 
for MPI applications, culminating in a 3-D visualization of MPI performance for each 
process in a co-operating computation spanning multiple hosts. This framework can be 
used to correlate MPI activity with demands for system resources, or to collect archives 
of performance data for subsequent analysis, or to construct real-time alarms to 
automatically monitor production codes. 

Examples will be drawn from MPICH applications running on Linux clusters. 

Linux is a trademark of Linus Torvalds. 

1. Brief Introduction to Performance Tuning Issues for MPI 
Applications 

An MPI job usually consists of two or more parallel threads of computation (or "ranks" in 
MPI-speak) that are working towards the solution of a common problem using a combination of 
private computation and co-ordination by message passing. Each rank is typically associated with 
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its own process, and the processes of a single MPI job are distributed across one or more hosts. 

The MPI message passing fabric consists of one or more implementations built on shared memory 
or high-speed networking. 

The message passing programming model embodied in the MPI APIs means applications are 
exposed to tuning and performance analysis issues related to: 

• message passing throughput (as determined by a complex interaction between the physical 
communications channels between the communicating processes, the distribution of message 
sizes, discontinuities in the message passing fabric as one moves from "within host" to 
"between host", etc.) 

• message passing latency (similar factors to throughput, but many networking technologies 
that do well at throughput have long latencies and vice versa) 

• distribution and duration of time spent waiting at rendezvous points 

• frequency and type of calls on the MPI library 

Of course these issues are in addition to the normal algorithmic efficiency concerns that are 
common to both parallel and non-parallel codes. 

2. Existing Techniques 

Most MPI library implementations are as shown in Figure 1, where the routines called from the 
MPI application, e.g. MPI_Foo, are actually empty wrappers with the real functionality 
implemented in routines behind the wrappers, e.g. PMPI_Foo. Depending on the implementation as 
either dynamic or static libraries, assorted techniques are available at run-time or at link-time to 
replace the empty wrappers with instrumented wrappers to collect performance data as control 
passes from the MPI application to the real MPI routines and back again. 



Figure 1: Typical MPI library structure with "wrapper" routines. 
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2.1 Profiling Libraries (mpich) 

One of the most widely available MPI implementations is the mpich distribution [1,2]. Included 
with mpich are several profiling libraries that use the wrapper architecture. 

• Time Spent 

Using timestamps in the MPI layer, the time spent in each PMPI routine is accumulated for 
each rank and reported at the end of the job. 

• Event Trace Logfiles 

Using timestamps in the MPI layer, export event records describing the times at which control 
passes from the application into individual PMPI routines and back again. Events are 
collected for each rank, and a merged event trace across all traces may be provided. 

Once dumped to a logfile the sequence of events is used as input to various retrospective 
analysis tools. Perhaps the best know of these is Jumpshot which is used to graphically 
display the flow of messages and states across the ranks. 

• User Defined State Transitions 

An extension to the event tracing at the MPI layer allows applications to emit additional event 
records to describe a more macroscopic view of the application’s execution, e.g. a data 
distribution phase that might involve many MPI calls. 

• Real-time Animation 

Additional code in the MPI layer generates direct calls to a monitoring application that uses 
the information to animate a GUI display of activity across the ranks of the job. 

While these services are useful in their own right, they fall short of the sort of capabilities needed to 
tackle some hard classes of tuning problems. For example: 

• There is no support for temporal correlation between MPI activity and other resource 
consumption statistics, such as CPU time usage, system call and context switch behaviour, 
demand for file system throughput. 

• For cluster-wide monitoring, a transport mechanism is required to ship performance data from 
the distributed computation to a central point for integrated analysis and logging. The mpich 
tools do this by shipping the event logs at the end of the job for post mortem analysis, but we 
need more sophisticated transport mechanisms to implement real-time monitoring while the 
MPI application is running. 

• For hard performance problems, the ability to revisit earlier experiments and evaluate new 
hypotheses about an application’s behaviour is very powerful. But this requires a reliable and 
flexible archival mechanism to capture, and late replay, key information about the execution 
of the application and the surrounding environment. 

• Once codes go into production, there is need to refine the performance monitoring to allow 
abnormal behaviour to be detected and appropriate alarms to be raised. 
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Fortunately, the Performance Co-Pilot services provide a ready-made solution to these additional 
requirements, and therefore we are in a position to offer even better performance monitoring and 
performance management services as an extension of the concepts in the base profiling libraties. 

3. Exporting Performance Data from an MPI Application into 
the PCP Framework 

It is necessary that all processes associated with an MPI job can be identified, even when they are 
spread across multiple hosts of a cluster. Obviously, the MPI libraries maintain this information, but 
is is generally not visible from the "outside". To overcome this we’ve borrowed a concept from the 
IRIX array services, namely the Array Session Handle (or ASH). Each MPI job has a unique 
ASH, and this is known to each process in the MPI job (in IRIX this is an additional attribute ol 
each process, in the Linux case we simply arrange to pass the ASH around in the environment when 
mpirun launches the MPI job). 

3,1 Collecting Data for Each Rank 

Each rank maintains private counters of the number of calls and optionally the time spent across the 
application code and the MPI libraries. 

Because there are typically many routines in the MPI library that are not used by a particular 
application, or rarely used by any MPI application, we provide the user with a mechanism to 
nominate individual MPI routines for which detailed statistics are to be collected. The statistics for 
all other MPI routines are aggregated into an other category. So the state of the application’s 
execution is classified as always being one of: 

• appl - in the user’s application code (or possibly the operating system kernel), but definitely 
not in the MPI library, or 

• one of the designated MPI routines - the state names are derived from the MPI routine names, 
or 

• other - in the MPI library, but not in one of the designated routines. 

A control file is used to nominate the MPI routines of interest (by name), the state names (or 
labels) and the preferred color to be used in the visualization tools (the special states appl and 
other are pre-assigned the colors green and yellow respectively), for example. 

# functions of interest 

# 

# MPI functions chosen from /usr/include/mpi.h 

# 

# label MPI function color 


# 

send 

reev 

sendreev 

beast 

barrier 

allreduce 

reduce 


MPI_Send blue 

MPI_Recv orange 

MPI_Sendrecv violet 
MPI_Bcast burlywood 

MPI_Barrier wheat 
MPI_Allreduce turquoise 
MPI_Reduce cyan 


Because time spent waiting to receive a message or reach a rendezvous may be an important aspect 
of understanding the performance of an MPI application, and because MPI jobs aie often run on 
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dedicated processor resources, all accounting of time in the PCP support for MPI uses elapsed time 
(rather than CPU time). 

To support the needs of multiple concurrent monitoring applications, there is no notion of "average" 
resource utilization or call "rates" at the point of collection, rather all instrumentation is exported as 
free running counters since the start of the rank. The monitoring tools will convert this raw data as 
required to support the aggregation needs of the end-users trying to assess the behaviour of the MPI 
job, typically as calls per second for one or more ranks and time utilization for individual MPI 
routines and/or individual ranks over some repeating sample interval. 

In the SGI implementations of MPI we have extended statistics that track message volumes 
(message and byte counts) for assorted transport protocols. These are not available for mpich and 
in the discussion below we shall concentrate on the statistics that are commonly available for MPI 
implementations. 

The counters are exported from the rank’s address space using shared memory via mmape d files. 
Every rank must call MPIJnit initially, and the PCP wrapper for MPIJnit arranges to create file in 
a well-known place with a well-defined name and structure. The creating process uses mmap to 
make the file contents directly modifiable without further system calls. The layout of the shared 
memory structure is: 

• A fixed size header as follows: 

O a version number for the structure definition (as an integrity check) 

O the total length of the mmap area (as an integrity check) 

O the most recently observed state for this rank 
O the number of extended statistics that are available (0 for mpich) 

O the number of states (or functions) for which statistics are being collected 
O the size of the statistics structure for each state (or function) 

O the maximum size of a label for a state 
O a boolean to indicate if the application is pthread based 
O the running total of state transitions seen for this rank 
O the id of the process associated with this rank 

• Then follows pointers to the base of the variable length segments, namely: 

O the per state statistics 

O the extended statistics (3 arrays, all NULL for mpich) 

O the trailing version number 

• For each state, the following counters are maintained: 

O the number of transitions into this state by this rank; for appl this is the number of times 
we’ve returned from the MPI library back to the calling application, for other this is the 
number of times all the "other" MPI routines have been called, else it is the number of 
times a designated MPI routine has been called 
O the running total of elapsed time spent in this state by this rank 
O the unique identification of the state, as extracted from the control file 
O the name of the state, as extracted from the control file 

• The arrays of the extended statistics (if any) are next. 

• The trailing version number appears again at the end of the area, as an integrity check against 
a corrupted or truncated shared memory area. 

The following environment variables may be used to control the interaction between an MPI 
application and the PCP collection infrastructure: 
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• MPI_PCP_CONTROL - alternate pathname for the control file 

• MPI_PCP_TIMING - collect timing statistics in addition to the function call frequency 
statistics (timing statistics are obviously more expensive with 2 calls to gettimeofday for each 
MPI library call, but the value of the information gathered is typically worth the extra 
overhead). 

• MPI_PCP_STATLIMIT - when extended statistics are supported, set the MPI call count or 
elapsed time window after when the extended statistics will be refreshed. 

• MPI_PCP_AGENTCHECK - check if the PCP MPI agent (see section 3.2) is running when 
the job starts, and disable instrumentation if it is not. 

3.2 Aggregation Across Ranks on a Host 

A custom built PCP agent (or more correctly Performance Metrics Domain Agent, or PMDA) is 
responsible for moving the MPI instrumentation from the mmape d into the PCP framework. 

When asked to provide values for metrics, this PMDA will scan the well known directory, noting 
any files that have gone away as a consequence of MPI jobs finishing and noting recent additions as 
a result of new MPI jobs starting. As required, the files are opened and mmape d. Because the 
PMDA requires only read access and because all updates to the data from the MPI application are 
atomic (or nearly atomic) there is no locking required (for each of the shared memory areas it is a 
single reader and single writer scenario). This means the overheads are very small, and so the 
instrumentation framework does not distort the behaviour of the MPI applications. 

As shown in Figure 2, the MPI PMDA also needs to read the control file to correlate this with the 
control information in the shared memory areas. 



mmap & 
store 


stats files 


mmap & 
fetch 


control file 


Figure 2: Exporting statistics from an MPI application and importing them into the PCP MPI agent. 


256 


Performance Monitoring of Distributed MPI Applications 




AUUG2K - Enterprise Security, Enterprise Linux 

The MPI PMDA is integrated into the complete PCP collection framework on a single host as 
shown in Figure 3. The Performance Metrics Collection Daemon (PMCD) serves as the point of 
contact for a client application wishing to retrieve metadata about performance metrics or the 
values of performance metrics from a host. PMCD knows very little about the actual performance 
data, but rather acts as a co-ordinator and message router so requests pass to the relevant PMDAs. 
PMCD will present a single response to the client (hiding the existence of the PMDAs) so the 
clients believe all performance data can be retrieved by talking to one process (PMCD) and using a 
single API to unify the disjoint collections of performance data known to the individual PMDAs. 



MPI application 


Figure 3: Integration of the MPI agent into the PCP collector infrastructure. 

3.3 Aggregation Across Hosts 

The PCP protocols between the collection daemon (PMCD) and a monitoring application use 
TCP/IP and are hence distributed. Monitoring applications are able to establish connections to more 
than one PMCD, and fetch metrics from them in a sequence and at a frequency determined by the 
monitoring application. In this way, a single monitoring application can gather and process 
performance data (that may include much more than the MPI data, for example process resource 
utilization and platform activity) from selected ranks and selected nodes in the cluster, as shown in 
Figure 4. 
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Figure 4: Monitoring an MPI application spanning multiple nodes. 

4. PCP Monitoring Tools 

4.1 Stripcharts for Time-series Analysis 

The simplest GUI presentation of complex performance data is via stripcharts that present a 
time-series of observations for collections of related performance data. Different charts may be 
crafted to show a variety of performance factors and correlations, for example: 

• The distribution of call frequencies or time spent across the MPI library routines for a single 
rank. 

• The distribution of call frequencies or time spent across the ranks for a single MPI routine. 

• The temporal correlation between MPI activity and non-MPI performance measures such as 
CPU utilization, system call or context switch rates, disk spindle utilization, VM paging, etc. 

Figure 5 shows examples of some of these options. 
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Figure 5: Activity across ranks and across MPI routines. 

4.2 3-D Visualization of MPI Performance 

In some cases the temporal aspect is less important, e.g. the computation is steady state, or modal 
with long periods of steady state between mode transitions. If this is the case, and there are large 
numbers of performance metrics of interest, the using visual cognition and 3-D performance 
visualization can lead to deeper insight of the application’s behaviour. 

Figure 6 shows one such visualization in which the dynamic behaviour of each rank across the MPI 
routines is shown as a column of blocks. Note that by viewing the scene orthogonally, you can see 
the dynamic behaviour of each MPI routine across the ranks as a row of blocks. The square blocks 
are animated such that the height of an individual block is proportional to the time spent by one 
rank in one MPI routine (or group of routines in the other case). 
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Figure 6: 3-D visualization of CPU time distribution and current state across the ranks of an MPI 

job. 


4.3 Monitoring Production Codes 

Because the PCP infrastructure exports real-time data that reliable and timely, it is possible to 
construct monitoring tools that will "watch" production codes, or operational systems. These tools 
are usually rule-based and when a rule’s predicate evaluates true an associated action will be 
executed. 

The actions might be: 

• raise a visible alarm in a GUI tool, or 

• send e-mail, or 

• post a message to a management console, or 

• abort a job 

PCP provides a powerful inference engine that could be customized to use the MPI performance 
data (in addition to the platform performance data) to provide a new set of automated, rule-based 
management tools for sites running production MPI jobs. 

4.4 Retrospective Analysis 

The PCP protocols treat a host and an archive as a semantically interchangeable source of 
performance metrics. Hence all of the PCP monitoring tools can be used for interactive real-time 
tasks, or retrospective analysis from previously captured PCP archives. 

Conclusions 

Based on PCP’s open interfaces and architecture that is extensible at all levels, we’ve been able to 
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leverage the "wrapper" architecture of MPI libraries to provide a new and powerful set of tools to 
tune and manage the execution of MPI jobs. 
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Introduction 

One of the primary and most demanding roles of system administration is ensuring the services 
used by the user community are highly available. It is important to note that users do not care 
about machine or network reliability, only service reliability. They do not care if your servers 
are fully operational, providing excellent response time for other services if the service they want 
to use is not operational. 

The end user services provided by IT solutions are based on many layers. At the bottom 
is the physical hardware. Over the hardware layer sits the operating system and finally the 
application. Network availability also effects end user services but is beyond the scope of this 

paper. 

All of the these layers must be fully operational for services to be available. Failure in any 
one layer will usually result in a service outage. In addition to actual failure any layer may be 
unavailable for other reasons, such as upgrades or scheduled maintenance. 

Hardware has become increasingly reliable, and in addition there exist relatively cheap so¬ 
lutions for the most common forms of hardware failure such as disk drives and power supplies. 
It is worth remembering that although hardware is not a common form of system failure, the 
repair time is usually much longer than for failures in the higher layers since failure will often 
require the intervention of a human and possibly the aquisition of parts. 

Operating systems (at least the good ones) have also become relatively reliable. It is not 
uncommon for machines to stay up and fully functional for weeks, months or even years. How¬ 
ever, there are still cases where a normally reliable OS can suddenly become unreliable due to 
a previously unnoticed bug being exercised by an application or mix of applications. 

Applications are a mixed bag, with some being extremely reliable while others can cause the 
system administrator some headaches. However, the sheer number of applications often present 
almost guarantees there will be problems with some services from time to time. 

My experience of recent years is that more than 90% of unscheduled outages of services are 
caused by software failure (either the OS or the applications) rather than hardware failures. 

IT centers have tended to focus on the hardware reliability. RAID disk, redundant power, 
hot swapable CPUs etc are examples of attempts to make service provision more available by 
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focussing on making the hardware more reliable. Clustering is a solution to improved reliability 
at the hardware and OS layers, but has limited usefulness to improving application reliability. 

Traditional approaches to reliability also provide the system administrator with little relief 
from problems requiring scheduled outages, such as hardware or OS upgrades. 


Service Focus 


As mentioned previously, users are only really concerned about the actual services they are using. 
They don’t care how a service is provided as long as it is available. The system administrator 
is thus free to examine alternative ways to provide reliable services, without the traditional 
constraints of focussing on hardware and OS reliability. 

A relatively recent tool available in this area is the layer 4/7 switch, sometimes called a local 
redirector. 


Layer 4/7 Switching 


A layer 4/7 switch (L47S) is a network device similar to a traditional ethernet switch in con¬ 
struction and appearance. It performs the same functions as a traditional switch and can be 
used only for that purpose if desired. 

However, the switch software offers many additional features many of which are outside the 
scope of this paper. I will focus entirely on the aspect of “Server Load Balancing” (SLB) which 
is one of the more interesting capabilities of such devices. 

The device basically functions as a proxy or virtual server for the hosts providing the real 
services. Users connect to the L47S as though it was a real server. The switch maintains internal 
state information about the status of the servers providing the real services and switches the 
users request through in a transparent fashion to one of the real servers. 

The L47S will detect when servers are down by loss of layer 3 connectivity (servers stop 
replying to ICMP requests) and mark the server as “down” in its internal tables. All future 
connections to services offered by that real host are redirected to alternative servers supplying 
the same network services. 

For each service, the device will also monitor each individual service (port) by attempting 
to open a connection. If a port becomes unavailable, the switch marks that service as “down” 
on that server. Future requests for that service will be routed to alternative hosts offering the 
same service. 

For selected services, which includes all the commonly used ones, the switch can also perform 
checks at layer 7 (the application layer). Thus for an IMAP port, the switch will attempt to 
make a valid IMAP connection. If the server does not respond correctly the service is marked 
as down. This handles the case where an application becomes unusable, but is still running and 
holding the server side port open. 

In cases where machines or applications fail, the switch will relatively quickly detect the 
problem and route all future connection requests to alternate servers. Open TCP connections 
will be broken however, so the system may not be applicable to all types of services. 
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For most types of sevices however, a broken connection will merely prompt the client to 
reconnect, thus the user experiences only a short outage. 

There are several manufactures of L47S devices. We choose a Foundry “Serverlron” switch. 


Service Replication 


The deployment of a L47S for service provision assumes that it is possible to arrange for services 
to be replicated across some number of real servers. In some cases, this is trivial since services 
such as SMTP have little or no data associated with them. For other types of service however, 
there is a significant data requirement which is vital to the service being offered. WWW services 
are an example of this class of service. 

In the latter case, there are 2 solutions. The first is to replicate the data required between all 
the real servers providing the service. This is only feasible if the service basically only requires 
read only access to the data. 

The other alternative is to separate the storage for the services from the actual machine 
providing the service. This approach is only feasible where the service can operate with multiple 
instances of the server accessing the same data. 

If neither of the above cases are true, the service is not a candidate for L47S replication. 

We chose the latter approach and purchased a highly reliable network attached storage device 
to handle our data requirements (a Network Appliance filer). The filer contains all the data for 
all services we run and provides the highly reliable back end storage which allows the front end 
servers to run replicated services. 

The following diagram (Figure 1) shows the overall configuration. 

Figure 1. 



The dotted lines in the above diagram represent additional hardware we would require to 
ensure there are no single points of failure in our system. Even without these additional items, 
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the system offers high availability since the single points of failure are very stable devices. 
For example, the filer is a dedicated device, performing a relatively simple service. The code 
complexity is considerably less than a general purpose server. In addition, the filer contains 
many of the traditional features to improve hardware reliability such as redundant power and 
RAID disk. 

We believe that the configuration we have chosen provides adequate reliability for our needs 
at a cost we can afford. In additional, the single points of failure can be easily removed by 
additional of extra hardware. Both the NetApp filer and the Foundry Layer 4/7 switch can be 
clustered to provide failover in case of hardware or software failure. 

Currently were are running the following services through the L47S: DNS, SMTP, IMAP, 
POP, WWW, WWW proxy, RADIUS, RDATE and NTP. 


Gotchas 

Configuration of the switch needs to be done carefully. In particular, the administrator must 
pay careful attention to the port parameters which control aspects of the switch operation such 
as how long the switch remembers a connection (the port age parameter). Since in many cases 
it is necessary for subsequent connections from a host to return to the same real server, a too 
short timeout on the port age results in subsequent connections by the same user being sent 
to a different real server. This can cause unexpected results. A good example of a problem in 
this area is any form of WWW connnection where the server maintains state between HTTP 
connections. 

Another problem we encoutered early was that a service can fail and the system administrator 
can be unaware of any problem. Users notice no outage and hence fail to report the problem. 
Increased monitoring of hosts or SNMP monitoring of the switch can solve this problem. 


Conclusion 


It is now possible for system administrators to stop focussing on making any single piece of 
hardware or software reliable and instead focus on service availability, which is what users 
actually perceive. 

Tools such as a layer 4/7 switch and/or highly available network attached data storage 
devices allow the system administrator to build solutions whereby the effects of a failure in any 
component, be it hardware or software, result in minimal impact to users. These devices are 
relatively cheap compared to alternative solutions such as fault tolerance or clustering and can 
be used for many of the common services which require high availability. 

In addition, the life of the administrator is made significantly easier since no server must be 
kept up at all times. Scheduled upgrades etc can be easily handled by temporarily instructing 
the switch to route services to the alternative servers while the maintenance is performed. 
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Writing portable kernel code 

By Darren Reed 

Abstract 

The development of traditional software packages for multiple platforms is a well understood problem 
today, with packages such as autoconf/automake used by many software authors to aid in setting up 
their software so that it can deal with the myriad of platforms currently in use. When writing software 
that is to be compiled into the kernel, the programmer cannot make use of such configuration scripts 
and nor is it desirable to use header files that are not generic in name and purpose. This paper 
describes some approaches that can be taken to solve this problem. 


Introduction 

In the beginning, I started out writing IP Filter 
for SunOS4. In the context of this paper, all 
that is significant is that IP Filter is a pseudo 
character device that can either be used as a 
Loadable Kernel Module (LKM) or built into 
kernels (static linking), where available. At 
that time, Linux was not a significant player, 
Solaris was still struggling with reliability and 
the BSD’s were largely unknown. As time 
progressed, so did the desire of people to use 
IP Filter on different platforms. Apart from 
some very different methods for getting it to 
interface with these systems, there were very 
few specifics in the kernel, unlike what one 
must go through when using developing 
software that must deal with tty’s directly 
through termio(s). 

One of the primary goals was to be able to use 
the same code path for compiling in a kernel 
environment as user environment. With a few 
exceptions, this has been achieved. 

This paper discusses the lessons learned in the 
process of porting it to different operating 
systems. In this time, it has also lead to an 
understanding of the “hidden differences” 
between the different architectures. 

Caveats 

For those writing device drivers for hardware 
(network cards, etc), it is generally not possible 
to write code in such a way that it can be used 
in a multi-platform sense. The problem here 
being that each platform has its own interface 
to perform tasks such as reserving virtual or 
physical address space for direct memory 
access (DMA) by the driver, among others 
such as interrupt handling, etc. 


General Rules for Kernel Coding. 

When developing code to run in the kernel, 

some of the basic things you must not forget 

are the following: 

• You MUST keep track of all your 
memory allocations and free it up when 
appropriate. Even if you are a LKM, 
memory allocated stays allocated until you 
free it or the application terminates. In the 
case of the application being the kernel, 
this either means until it crashes or 
reboots. 

• Debugging a running kernel can be a 
painful process if you choose to step 
through it as kernel debuggers generally 
don’t step line by line but instruction by 
instruction. Remember that printf() is 
your friend although it is likely the 
formatting options will be more restricted. 
A warning about printf in the kernel: 
whilst available on all platforms, it is not 
presented as an Symmetric Multi- 
Processor (SMP) safe function (there 
being other alternatives which are) and 
may have limitations on formatting. 

• If you introduce an infinite loop, chances 
are you will lock up your machine. 
Hitting A C or A Z will not work like it 
usually does. 

• Bad pointers and bad pointer arithmetic 
can be very hard to pick up so be sure to 
always initialize variables and use 
compilers such that they will remind you 
of this. Unlike user applications, you will 
have a much larger valid address space 
and can stomp on things like disk buffers 
if you’re not careful. Remember the 
kernel equivalent of a core dump is a 
crash. 

• Be very stingy when using automatic 
variables that are “buffers”. More often 
than not, your kernel heap/stack space is 
quite limited. This should serve as a 
warning that using recursion should be 
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avoided where at all possible. If a sort or 
search algorithm you’re using has both a 
recursive solution and iterative alternative, 
look to using the alternative even if it 
means allocating/freeing more memory at 
run-time. 

• If the kernel is an SMP kernel then you 
really have no choice but to make all your 
code SMP safe. In the short term this may 
mean more work in debugging your 
locking, but in the long run it will pay off 
in performance dividends. It may well be 
that when you start out, only one or two of 
your target platforms is fully SMP, but as 
they change, your code is already ready. 

• If possible, make your code work as a 
LKM. In general, on those platforms 
which support LKM’s, it is easier to use 
this interface to hook your code into the 
kernel. There are two extremes with this: 
Solaris, where everything excluding core 
internal functions is in LKM’s (this 
includes network drivers, scheduling, file 
systems, etc) and platforms where it is not 
possible to use LKM’s due to restrictions 
in the CPU architecture. One such 
platform is NetBSD/arm32 - the current 
kernel design puts LKM’s outside of the 
address range the CPU can branch to. 

Step 1: User and Kernel . 

The first step towards getting your kernel code 
to compile on multiple platforms is to get it to 
compile both as a part of a user application and 
as a part of the kernel. This will generally 
highlight where your operating specific 
interfaces will be and give you a guide as to 
where your code separation will be. As an 
example of this, you cannot expect to call the 


usual stdio functions such as fopen(), fgets(), 
gets(), etc, and may even find that normal 
functions take different arguments. For 
example, sprintf() might be the usually 
sprintf(char *, const char *, ...) for user 
programs but be sprintf(char *, size_t, const 
char *, ...) in the kernel. The most obvious 
obstacle that needs to be handled is that of 
header files - /usr/include just does not exist 
when you’re compiling kernel source, so either 
you provide your own structures and functions 
to imitate those missing or look for the kernel 
alternatives. 

Step 2: Platform dependencies 

There are some parts of your code that are 
going to require platform dependent structures 
- they may even be version specific. This 
should be expected. If you’re developing an 
LKM, then this will be your first section of 
code that will be platform-dependant. 

Macros 

There are two approaches to writing portable 
code: 

1. You can little code with #ifdef 
statements to conditionally call a function 
this way or that or; 

2. Keep usage of #ifdef’s limited to 
header files where you create generic 
macros that generate the correct code. 

The advantage of (2) is that the clutter is kept 
away from the mainline of your code, making 
it easier to read, easier to maintain and easier 
to audit. An example of these methods is as 
follows: 


1. A section of cod from .c file using #ifdef’s to correctly allocate memory. 
#ifdef _KERNEL 

# ifdef sun 

# if defined(_SVR4) | | defined (_svr4_) 

a = kmem_alloc(sz, KM_NOSLEEP); 

# else 

a = new_kmem_alloc(sz, KMEM_NOSLEEP); 

# endif 

# if BSD>= 199306 

MALLOC (a, sz, type_a *, M_NOWAIT); 

# endif 
#else 

a = malloc(sz); 

#endif 
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2. A .h file with the #if def’s, creating macros, plus the resulting C code: 

#ifdef _KERNEL 

# ifdef sun 

# if defined (_SVR4 ) | | defined (_svr4_) 

# define ALLOC(v,t,s) (v) = (t)kmem_alloc(s, KM_NOSLEEP) 

# else 

# define ALLOC(v,t,s) (v) = (t) new_kmem_alloc(s, KMEM_NOSLEEP); 

# endif 

# if BSD>= 199306 

# define ALLOC(v,t,s) MALLOC(v, sz, t, M_NOWAIT); 

# endif 
#else 

# define ALLOC(v,t,s) (v) = (t)malloc(s) 

#endif 


The resulting code in your x file will be: 

ALLOC(a, type_a *, sz); 

The cross platform Unix ugliness is hidden 
away in a .h file - where it should be. There 
are some instances when using #ifdef s in your 
x files will be unavoidable - for instance, 
making sure all the correct include files are 
present. 

Using of per-platform include files (i.e. 
solaris.h, netbsd.h, linux.h, etc) is fine for 
applications and also works well for LKM’s, 
but when compiling code into a static kernel, it 
makes little sense to have files with obscure 
and potentially conflicting names in the kernel 
source tree. 

There will be occasions when macros are not 
enough. This is generally restricted to 
situations where you are performing an action 
that is specific to a platform, such as sending a 
packet out onto the network. 

Step 3: Code Separation 

There are some code segments which are 
sufficiently large and platform dependant that 
will make even the most abitious use of 
macros a bad idea. This will also be the case 
for dividing some code between compiling for 
user programs vs the kernel. 

The most obvious case for this is the support 
necessary to create an LKM. The LKM is not 
only particular to the OS when it comes to 
code base but compilation and linking are also 
likely to vary. If your target is a number of 
related platforms (such as the various BSD’s), 
there may be some justification in splitting the 
code three ways: platform independent code, 
platform dependant code and operating system 
dependant code. 


As a general rule, any code that forms the basis 
for the implementation of an algorithm, such 
as those used for cryptography/authentication, 
should be platform independent. 

Tips 

Compiling 

As a general rule, you will want a separate 
Makefile per platform. Do not even bother 
trying to pretend that various Makefile flags 
will be sufficient. This at least will allow you 
to deal with the different implementations of 
“make” in a sane way without requiring people 
to use 3 rd party tools such as gmake. 

Debugging 

There is little point in worrying about whether 
your code works well on n platforms if it does 
not work at all. For this reason, it is 
imperative to get your basic algorithms right. 
Compiling code and testing it as an application 
is a significant step in being able to test your 
code, although this avenue will not always be 
available to you - for example, when writing a 
new filesystem type. 

It is common practice when developing code to 
run in the kernel to actively detect situations 
that should not occur and take action (i.e. 
panic) when these situations occur. For 
example, if you have a function to handle 
freeing up of a complex structure, you might 
wish to ensure that you always get passed a 
non-null pointer. 
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Crash Dump Debugging 

For those working with SunOS/Solaris kernels, 
there is a book available called “Panic! ,,27 °. 
This book provides an excellent aide in 
unraveling the information left behind in the 
crash dumps, as well as serving as a primer for 
anyone who wants to examine/modify 
variables on a live system. 

Unfortunately, the best tool available for crash 
dumps on Solaris, that Eve seen, is an internal 
tool - “Crash Analysis Tool” (ACT). It 
provides a summary of all threads running, 
handles nested panics (which can be 
misleading when stepping through with adb). 

On BSD platforms, if you build your original 
kernel with “-g”, it will separate the symbol 
information during the linking process, leaving 
your kernel at only a few megabytes in size 
and the somewhat larger debugging 
information in the kernel compile directory. 
Debugging of LKM’s is not yet supported on 
any of the BSD platforms (there is no unified 
kernel symbol table provided). 

Locking 

On platforms where your code is running in an 
SMP kernel, it is essential to get your use of 
locking correct. For simple problems, such as 
acquiring a lock you already own, it is 
common for the kernel to panic. Some 
platforms will impose restrictions such as not 
being able to acquire write locks whilst not in a 
user context to ostensibly prevent deadlock 
situations in the device driver. Again, macros 
can save the day here - if your locking is all 
correct, you should be able to substitute the use 
of the missing locking interface with that 
which is available, if any. On BSD systems, 
use of spl*() is still used to protect code 
sections. In practice, this has lead to me using 
macros for both spl*() and the more complex 
locking interfaces, with the respective macros 
defined to be nothing depending on which isn’t 
available. For example: 

#if SOLARIS 

# define MUTEX(x) 

mutex_enter(x) 

# define SPLNET(s) 

#else 

/* Else it is BSD */ 

# define MUTEX(x) 

# define SPLNET(s) (s) 
spinet() 

#endif 


If you’re only using simple/spin locks (i.e. 
mutexes) then debugging locking is not a 
problem. If you’re using reader-writer locks or 
semaphores, guarding against acquiring the 
same lock more than once should be 
imperative. A step to solving this is to convert 
all your locks to be mutexes. 

Public/Private Interfaces 

As a general rule, in order to increase the 
portability of your code, you should stick to 
using published, public, interfaces. There is a 
large temptation, with Open Source kernels, to 
call function xyz_frob() from your own code 
because you can have the source code, can see 
how it works and yes, it’ll fit quite nicely into 
what you’re trying to do, meaning you don’t 
have to go to large amounts of trouble to use 
that other function. 

The largest impediment to this approach is the 
lack of documentation for kernel interfaces. 
To date, only Solaris provides a reasonably 
complete manual page section for kernel 
functions, some do not provide it at all as man 
pages, but as large PDF documents and others, 
such as Linux, expect you to know which 
HOW-TO is relevant for you and read that. 
The de-facto standard section for kernel 
manual pages is 9. 

If you documentation in any form is lacking, 
resorting to scanning the kernel’s symbol table 
is the only alternative, along with grep’ing 
through /usr/include/sys for prototypes. 

Performance 

It is very tempting, as a kernel programmer to 
want to hand optimize everything to the n th 
degree. This approach to coding can be 
somewhat self-defeating later, as the highly 
optimized code is neither easy to read or 
maintain safely. Compilers these days are 
getting large enough and ugly enough to take 
care of this if they are instructed to. An 
example of this is using Sun’s unbundled C 
compiler that can be instructed to 
automatically inline functions if and where 
possible. This removes any onus on the 
programmer for having unnecessary 

_inline_ statements in the code line - 

something that may not be recognised by all 
compilers on all platforms. Personally, I 
encourage the use of native compilers on their 
respective platform ahead of using gcc when 
compiling code for the kernel. In some cases 
the object modules generated may not be 
compatible with that expected by the kernel. 
An example of this is compiling a module for 
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Solaris with gcc, using “-g” on the gcc comand 
line will lead to a panic when Solaris attempts 
to load the module. 

For general tuning, access to tools such as 
prof/gprof for kernel code is not common. 
Whilst there has been some progress in this 
arena for Linux (newer kernels can be 
compiled run as a user application). In 
situations where this is not possible, given 
your code has been prepared to compile and 
run in user space, traditional methods of tuning 
are easily accessible. The task then remains to 
construct a testing environment which 
simulates expected load. 

Difficulties 

Linux 

The biggest problem that currently exists is 
trying to write portable kernel code is to get it 
to compile for the Linux kernel and any other 
system. The primary cause for this is that 
Linux is only a kernel, whereas if you actually 
want to use Linux, you need a collection of 
tools, starting with init and /sbin/sh. This 
separation is felt most when it comes to 
include files - glibc provides the entire 
contents of /usr/include and is not compatible 
with the kernel include files! This strategy, 
used by Linux distributions, is enough to make 
all of your previous efforts in creating portable 
code feel worthless when it comes to Linux. 
There are other philosophical problems with 
Linux, such as the constant changing of 
internal interfaces (which is felt by some not to 
be a problem so much as it discourages the 
distribution of binary only modules for Linux). 
In developing kernel code for Linux, it is 
almost a case of choose Linux or everyone 
else. 

In comparison, all other platforms use a single 
tree of header files for /usr/include, where 
those in /usr/include/sys (if not other 
directories) come directly from the kernel 
source tree. 

Operating System Version 

For the sake of including the correct include 
files, etc, and not being able to use 
autoconfigure, it is desirable to know which 
version of the operating system you're 
compiling under. It has taken time, but the 
various Open Source projects now seem to 
understand this and will provide a symbol, 

such as_NetBSD_version_that can be used 

to determine the correct age of the kernel 
source being used. Unfortunately commercial 


platforms do not appear to recognise this, 
leaving the programmer to develop their own 
method(s). 
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ABSTRACT 

A number of generic benchmark programs are available for measuring various aspects of 
system performance. Some of these, such as iozone and bonnie, measure I/O perfor¬ 
mance, and are often used to measure the performance of the mass storage devices such 
as disks and RAID arrays. They suffer from the disadvantage that they measure overall 
system I/O performance, and not only the storage subsystem. Thus the results of bonnie 
depend not only on the storage device, but also on the system on which it runs. In ex¬ 
treme cases it is possible to perform tests which do not transfer data to the storage device 
at all. 

rawio is a benchmark program specifically designed to measure the storage device alone. 

This paper discusses how to use rawio to test and compare mass storage devices, and 
what pitfalls may occur in so doing. 

Introduction 

“Lies, damn lies and benchmarks”. Benchmarks are best known for proving that vendor 
A’s product is better than vendor B’s (This is vendor A’s benchmark. Vendor B, of 
course, has another benchmark which proves the opposite). This state of affairs may be 
amusing, but it makes it very difficult to make objective comparisons of storage subsys¬ 
tem performance. In the course of writing the Vinum volume manager [Vinum], I needed 
to make objective comparisons between conventional disk drives, hardware RAID arrays, 
and software storage devices such as Vinum and ccd. I discovered that no good bench¬ 
mark program was available for measuring storage subsystem performance. 

Pitfalls, red herrings and the 1000 MB/s hard disk 

What about programs like bonnie and iozone ? They purport to measure I/O perfor¬ 
mance, but in fact they measure the file system, not the storage device. The real issue is 
that they use interfaces supplied by the operating system, such as the C library (buffered 
I/O) and the file system. These interfaces can place a significant load on the processor 
and thus reduce the measured I/O throughput. For example, bonnie 's character mode out¬ 
put test tests the CPU speed, not the storage device. They may also cache data in memo- 
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ry. Since this data can be accessed much faster than real mass storage devices, the bench¬ 
marks can deliver unrealistically high results—the so-called gigabyte/s hard disk. While 
this can be a valid performance measurement, it is not suitable for measuring relative per¬ 
formance of storage devices. 

Disk access issues 

Another misconception is the concept of “disk speed . As IDE and SCSI transfer tates 
pass the 100 MB/s rate, most people forget that real-life disk access is much slower. Off- 
the-platter transfer rates seldom maintain speeds of over 30 MB/s. Even this value is far 
beyond real life values. 

The first question is: what is the system to be used for? In some cases, such as single¬ 
process sequential access (for example, streaming video) the real speed is not much lower 
than these values. In most large modem systems, however, the access patterns are differ¬ 
ent: a large number of processes access different parts of the storage system concurrently, 
usually with the intermediary of a file system layer. 

As soon as multiple processes are involved, the throughput drops dramatically. Each 
transfer now consists of a relatively long positioning phase—even the fastest modern 
disks require more than 5 ms on average—followed by a relatively short data transfer 
phase. For example, the UNIX file system (ufs ) transfers on average blocks of between 6 
and 10 kB. At a transfer rate of 20 MB/s, the transfer takes between 300 jus and 500 /.is. 
Unfortunately, the positioning overhead reduces the overall transfer rate by at least 90%: 
the real random access throughput of a modem disk drive is seldom more than 2 MB/s. 

This doesn’t look good, does it? And it’s not what the spec sheets say. It’s not even what 
the benchmarks say. But it is reality. 

Solving the problem 

What do people do about this problem? 

• Users write benchmarks which show off storage devices to their best advantage. This 
is not a problem as long as storage devices are of a similar nature; the net result is that 
that they apply a “fudge factor” to the real performance. When comparing RAID ar¬ 
rays, volume managers and conventional disks, though, the differences in implementa¬ 
tion are sufficient to render meaningless any comparison of the results of these bench¬ 
marks. 

• Vendors add cache memory to the drives, which can speed up read access significant¬ 
ly. A relatively common access mode, seen for example in ftp servers, is for multiple 
processes to read “sequentially”. The drives perform read-ahead by the expedient of 
reversing the sector order on the drive and starting to read as soon as they are on track. 
Thus they can perform a read-ahead of up to half a track on average before the re¬ 
quired data passes the head. This can give great performance improvements in some 
cases, since subsequent reads do not need to access the disk surface. 
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How rawio tries to be honest 

No user program under UNIX can access disk hardware directly, but rawio tries to come 
as close as possible: it uses the raw interface to the disk drive, thus bypassing the buffered 
library routines, the file system and buffer cache. It addresses the question of multiple 
process access by starting a user-configurable number of concurrent processes, each of 
which accesses a different part of the device. This brings it closer to real application ac¬ 
cess patterns than other benchmarks. 

Factors affecting disk performance 

A number of factors influence disk performance: 

• Transfer size makes a significant difference when positioning is required between con¬ 
secutive transfers. This is obviously the case with true random access, but it can also 
occur with “sequential” access. 

• True sequential access requires that only one process access the drive. As soon as two 
processes access the disk, the operating system multiplexes between them, causing in 
effect a degenerate case of random access. 

• Modem disk drives attempt to cater for sequential accesses by caching data, both 
when reading and when writing. 

• File systems seldom store data completely sequentially; instead, it is stored in blocks 
or extents. Thus even when a single process appears to access a single file sequential¬ 
ly, the file system may map these requests to wildly differing locations on the disk. 
Thus the ufs file system will normally write blocks of 4 kB or 8 kB and will attempt 
to store consecutive blocks in the same cylinder group. Storage space fragmentation 
can make this impossible, however. 

• In addition to file I/O, a file system makes accesses to metadata such as inodes. The 
location of these accesses can cause significant hot spots in disk access. For example, 
when creating or deleting a ufs file, the inode must be rewritten. The inodes are typi¬ 
cally stored close to one of the superblocks, which nowadays are located at offsets of 
32 MB. This access pattern can have important significance for striped storage. 

Probably the largest single difficulty to overcome is the difference between the single pro¬ 
cess viewpoint and the operating system viewpoint. The single process sees only a subset 
of the total number of transfers, making it possible to expect a sequential access pattern 
when in fact the accesses are random. 

Rawio tests 

rawio simulates multiple concurrent disk accesses by running a number of concurrent 
processes, each of which performs part of the test. This is the same technique used by 
bonnie . Unlike bonnie , however, which only uses this technique for one test, and always 
stalls three processes, rawio can start a variable number of processes, and the default 
number is 8. 
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rawio performs four basic types of test: Random Read, Sequential Read, Random Write 
and Sequential Write. When performing the sequential tests, by default each process 
starts transferring from a different part of the disk, so the access is only pseudo-random. 
In addition, rawio 's write tests can perform a mixture of read and write operations, which 
can confuse the cache algorithms of many drives. 

A number of other parameters are configurable, including the size of the transfers, the 
amount of information printed about the tests, the number of transfers to perform, and 
which tests to perform. By default, rawio only performs the read tests: since it bypasses 
the file system, the write tests will overwrite any file system structure on the disk. 


Interpreting rawio results 

rawio produces output in three different formats. By default, the output looks like this. 


Random read Sequential read Random write Sequential write 
ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

adOh 1616.5 98 12215.0 746 2015.6 124 16396.6 1001 


This printout shows an identifier (by default the name of the drive), the results of the four 
tests in kB/second, and the number of transfers per second. 

With the verbosity flag setting -v 1, the output looks like this: 


Test 

ID 

K/sec 

/sec 

%User 

%Sys 

%Total 


RR 

adOh 

1563.8 

99 

0.1 

3.9 

4.0 

2048 

SR 

adOh 

17067.3 

1042 

0.0 

10.4 

10.4 

2048 

RW 

adOh 

2019.2 

126 

0.3 

1.6 

1.9 

2048 

SW 

adOh 

11797.3 

720 

0.0 

6.5 

6.5 

2048 


In this example, each test also shows the CPU time usage and the number of transfers. 
Finally, -v 2 produces the following output. 


Test name: adOh 
Transfer count: 16384 
Record count: 2048 
Process count: 8 


Device 

Test 

s size: 

ID 

10000000000 

Time 

KB/sec 

/sec 

%User 

%Sys 

%Total 

Reads 

Writes 

RR 

adOh 

20.868870 

1559.8 

98 

0.0 

1.5 

1.5 

2048 

0 

Child 

Child 

Child 

Child 

Child 

Child 

Child 

Child 

SR 

0 reading 

1 reading 

2 reading 

3 reading 

4 reading 

5 reading 

6 reading 

7 reading 
adOh 

from 8387103232 
from 1656464384 
from 4540458496 
from 9713902592 
from 4467875328 
from 9293569536 
from 5866543616 
from 4392658432 
3.427620 

9789.4 

597 

0.0 

5.3 

5.3 

2048 

0 

RW 

adOh 

16.056036 

2028.2 

128 

0.1 

2.1 

2.2 

0 

2048 

Child 

Child 

Child 

Child 

Child 

Child 

Child 

Child 

SW 

0 writing 

1 writing 

2 writing 

3 writing 

4 writing 

5 writing 
7 writing 

6 writing 
adOh 

to 1783481856 
to 1656287232 
to 5602985984 
to 6107997184 
to 1590989312 
to 2964428800 
to 6701552640 
to 6377219072 
2.049101 

16375.2 

999 

0.6 

8.7 

9.3 

0 

2048 


This output format also shows the size of the transfer, the number of processes, and the 
size of the device. In the case of the “sequential” transfers, the location at which each 
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process starts the sequential transfer is also shown. 

The first time you run a test with rawio, the results can be disappointing. Consider an 
older drive, in this case a Quantum XP34301. rawio shows the following results: 


Random read Sequential read Random write Sequential write 
ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

dale 1271.7 79 1642.0 100 951.6 59 1212.2 74 


The same drive showed the following results with bonnie : 


-Sequential Output--Sequential Input-- --Random-- 

-Per Char- --Block- -Rewrite-- -Per Char- --Block- --Seeks- 

Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 
diskl 1500 5122 26.8 5121 12.0 2093 6.9 4703 22.8 4720 8.2 82.1 2.6 


According to rawio, this drive can transfer data serially at about 1.4 MB/s. According to 
bonnie, it’ll do about 5 MB/s. Why does rawio perform so much worse? 

Of course, it’s not a question the performance of the benchmark: the benchmark must re¬ 
port the behaviour of the device. The question here is what transfers are being per¬ 
formed. bonnie and rawio perform different tests: in fact, the only test that is compara¬ 
ble with rawio' s tests is bonnie' s “random seeks” test. Unfortunately, bonnie does not 
show the rate of data transfer, only the number of transfers per second, rawio shows this 
value too: 79 per second for reads, 59 per second for writes, bonnie shows 82 transfers. 
bonnie is presumably performing reads, and the values are close enough together to as¬ 
sume that both programs are, in fact, getting the same results. 

What about the sequential I/O? There are two main differences: 

• We’ve already noted that there are different ways of looking at the term “sequential”. 
bonnie performs the sequential tests with only one process, so the transfers are really 
sequential, rawio can do this too, by specifying only a single process for the tests: 

Random read Sequential read Random write Sequential write 
ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

dale 3550.0 217 3103.9 189 


This still isn’t as fast as bonnie. If we look at what’s going on at the disk driver level, 
with the aid of iostat 1 , we get the following for rawio : 


dal 

KB/t tps MB/s 
16.00 232 3.62 
16.00 234 3.65 
16.00 232 3.62 
16.00 233 3.64 


1. In the iostat examples, iostat prints out results every second. The output has been trimmed. The 
three columns show the average size of the transfer in kilobytes, the number of transfers per second, 
and the data rate. 
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By contrast, bonnie shows: 


KB/t 

tps 

dal 

MB/s 

62.50 

75 

4.59 

64.00 

75 

4.71 

64.00 

74 

4.64 

62.46 

73 

4.47 

64.00 

70 

4.39 


In other words, bonnie is performing 64 kB transfers. Or is it? In fact, it’s the C li¬ 
brary which is performing these transfers. On another platform, or even with different 
disks, the size of the transfer might vary. But rawio can specify a transfer size. A 64 
kB transfer produces the following results: 


Random read Sequential read Random write Sequential write 
ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

dale 3906.8 60 4843.7 74 

These results are surprising: firstly, they are still below the bonnie results, and second¬ 
ly the write performance appears to have improved much more (56%) than the read 
performance (10%). 

Again in this case, iostat shows some interesting results: 


KB/t 

tps 

dal 

MB/s 

64.00 

44 

2.72 

64.00 

43 

2.66 

64.00 

43 

2.66 

64.00 

44 

2.72 

64.00 

44 

2.72 

64.00 

44 

2.72 

64.00 

43 

2.66 

64.00 

44 

2.72 

er... 

64.00 

80 

5.01 

64.00 

75 

4.70 

64.00 

72 

4.52 

64.00 

72 

4.52 

64.00 

78 

4.89 

64.00 

71 

4.46 

64.00 

83 

5.20 


This particular drive appears to have very different transfer characteristics for different 
parts of the drive. This is an area where rawio currently cannot help: further research 
is needed. 


Transfer characteristics 

One of the more interesting things about rawio is the ease with which transfer charac¬ 
teristics can be changed. This enables graphing the behaviour of storage devices un¬ 
der different circumstances. The following sections describe a number of possible in¬ 
vestigations. 
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Effect of transfer size on throughput 

When performing transfers in a purely sequential manner (no mixture of reads and 
writes, only one process), the size of a transfer is normally not significant. An excep¬ 
tion is when the system cannot issue subsequent requests before the head has passed 
the sector. In this case, extreme performance degradation, up to 99%, can result. 

With random transfers, including the “pseudo-sequential” case, the effect is very dif¬ 
ferent. Typical modem disks have total latency of between 5 ms and 10 ms. Transfer 
rates from the disk surface are in the order of 5 MB/s to 30 MB/s. Theoretical transfer 
times t (in ms) and throughput r (in MB/s) for various transfer sizes are given by the 
formulae 

x = S * 0.512 / R 

t = 1 + X 

r=R*x/t* 1000 
n = 1000 / t 

Where 

Variable Purpose 
1 Total latency, milliseconds, 

n Number of transfers per second. 

R Raw disk transfer rate, MB/s. 

r Measured disk transfer rate, kB/s. 

S Transfer size, sectors of 512 bytes, 

t Total transfer time, milliseconds, 

x raw transfer time, milliseconds. 

In these calculations, kB represents 1,000 bytes, and MB represents 1,000,000 bytes. 

A typical IDE drive, such as the Western Digital WDC WD205BA, has 20 MB/s 
transfer rate and 9 ms total latency. For this drive, the theoretical throughput would 
be: 


Transfer size 

Transfer time 

Transfers 

Throughput 

(sectors) 

t (ms) 

/s 

r (kB/s) 

1 

9.5 

105 

53.7 

2 

9.6 

105 

107.2 

4 

9.6 

104 

213.3 

8 

9.7 

103 

422.1 

16 

9.9 

101 

826.7 

24 

10.1 

99 

1214.9 

32 

10.3 

97 

1587.7 

64 

11.1 

90 

2941.9 

128 

12.8 

78 

5129.3 

256 

16.1 

62 

8164.6 


For this drive, rawio measures: 


# rawio -x 

# for i in 1 2 4 8 16 24 32 64 128 256; do 

++ rawio -h -F -c $i -n 2048 -I $i -R -W -s 10000000000 /dev/radOg 


++ done 

Random 

read 

Sequential read 

Random 

write 

Sequential write 

ID 

K/sec 

/sec 

K/sec /sec 

K/sec 

/sec 

K/sec /sec 

1 

53.1 

104 


62.2 

122 


2 

106.7 

104 


130.2 

127 


4 

210.7 

103 


258.4 

126 


8 

419.8 

102 


511.7 

125 


16 

818.3 

100 


1012.6 

124 
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24 

1225.3 

100 

1475.0 

120 

32 

1590.7 

97 

1926.8 

118 

64 

2958.4 

90 

3799.0 

116 

128 

5224.8 

80 

4986.5 

76 

256 

5665.1 

43 

7392.7 

56 


There are some discrepancies between the theoretical and measured results: 

• Random writes are faster than random reads. This is probably due to write caching 
in the drive. 

• The results for 256 sector reads are well below expectations. Examining the trans¬ 
fers with iostat shows that the transfers are in fact being converted into dual 64 kB 
transfers. As a result, the transfer rates are not noticeably different from the results 
for 128 kB transfers. 


Effect of number of concurrent processes 

We have already discussed the effect of multiple concurrent “sequential” accesses. 
With the aid of the -p option, rawio can show results for a different number of con¬ 
current processes. In this example, the ID specifies the number of processes. 


# rawio -x 


# for i 

in 1 2 4 8 

16 24 

32 64 128 

256; 

do 




++ rawio 

-h -p $i 

-n 2048 -I $i -a 

/dev/rdale 




++ done 

Random 

read 

Sequential 

read 

Random 

write 

Sequential 

wrii 

ID 

K/sec 

/sec 

K/sec /sec 

K/sec 

/sec 

K/sec 

/sec 

1 

893.4 

55 

3573.4 

218 

883.7 

55 

5561.5 

339 

2 

1012.2 

61 

2867.0 

175 

1018.2 

62 

1890.3 

115 

4 

1127.8 

70 

1616.9 

99 

1142.7 

70 

2063.5 

126 

8 

1249.9 

77 

1640.6 

100 

1263.4 

78 

1546.8 

94 

16 

1367.2 

84 

1422.3 

87 

1359.8 

84 

1360.0 

83 

24 

1386.1 

85 

1442.1 

88 

1326.9 

85 

1768.9 

108 

32 

1353.2 

84 

1414.7 

86 

1378.3 

84 

1498.1 

91 

64 

1351.6 

84 

1426.6 

87 

1345.7 

85 

1628.4 

99 

128 

1350.0 

86 

1462.4 

89 

1351.5 

84 

1439.8 

88 

256 

1401.1 

85 

1474.6 

90 

1364.4 

85 

1504.5 

92 

This series of tests 

shows a number 

of points: 





• The random I/O performance increases with the number of processes. This is be¬ 
cause with increasing number of outstanding requests, the disk driver is able to sort 
the requests in a sequence which allows faster positioning. 

• The “sequential” I/O performance decreases with the number of processes, since 
more positioning is needed (the access pattern becomes more random). 

• The “sequential” I/O performance remains above the random I/O performance. 

rawio has an option -f which specifies that all sequential I/O transfers should take 
place from the same place on disk. This is effectively a test of the on-disk cache, 
since most transfers can be satisfied from on-disk cache. The results frequently look 
spectacular, but are somewhat meaningless. 
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Mixing reads and writes 

Traditionally, individual benchmark tests perform either writes or reads, but seldom a 
combination of both. This is a significant omission: disk drive cache behaviour can be 
strongly dependent on the transaction mix. The rawio random write test can perform 
such a mixture: the option -W 35 specifies that the I/O accesses should consist of 
35% writes and 65% reads. Note that -W 0 means 0% writes and 100% reads, and is 
thus equivalent to -R. 


Repeatable random access 

rawio uses the random () function to generate random numbers for the random 
test. This is not ideal: seemingly small differences in access patterns can cause signif¬ 
icant differences in throughput. In particular, it makes it very difficult to get repeat- 
able results. 

One attempt to minimize this effect is to use the same random numbers every time. 
rawio includes a table of random numbers which will be used instead of random () 
if the -S flag is used. There are significant objections to this approach, and it remains 
to be seen whether the approach has any advantage. 

Using rawio on RAID arrays 

rawio was written with the express purpose of testing storage devices other than sim¬ 
ple disks, in particular the Vinum volume manager. Many concepts change when 
more than one spindle is involved: in particular, increasing the number of concurrent 
processes can significantly increase performance. The following measurements illus¬ 
trate some of the differences. The tests were run with between 1 and 256 concurrent 
processes, and run on a Vinum volume consisting of a number of drives of widely dif¬ 
ferent performance characteristics. Four drives were pre-SCSI-1: 

Random read Sequential read Random write Sequential write 
ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

slow 549.9 35 708.1 43 557.4 34 564.9 34 


Another 6 were three year old narrow SCSI drives: 


Random read Sequential read Random write Sequential write 
ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

medium 1850.9 111 1844.9 113 1801.0 112 1854.1 113 


The final two drives were modem UDMA-66 IDE drives: 


Random read Sequential read Random write Sequential write 
ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

fast 2187.8 134 17790.9 1086 2147.5 133 22025.7 1344 


The Vinum volume was organized as a single striped plex with a 256 kB stripe size, 
which is large enough to ensure that 95% of all transfers map to a single drive. 
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The following series of tests show the relationship between performance and number 
of processes for this volume. The ID column is set to the number of processes: 


# rawio -x; echo; for i in 1 2 4 8 16 32 64 128 256; do 
++ rawio -p$i -I $i -n 2048 -h -a /dev/vinum/striped; 


++ done 

Random 

read 

Sequential 

read 

Random 

write 

Sequential 

write 

ID 

K/sec 

/sec 

K/sec 

/sec 

K/sec 

/sec 

K/sec 

/sec 

1 

753.8 

47 

1403.6 

86 

750.6 

46 

1386.4 

85 

2 

1407.2 

85 

1355.2 

83 

1395.2 

87 

1538.9 

94 

4 

2317.3 

147 

1783.0 

109 

2352.3 

140 

1770.3 

108 

8 

3270.4 

203 

2734.3 

167 

3326.2 

200 

2791.3 

170 

16 

3788.7 

231 

4148.3 

253 

3743.1 

235 

2748.7 

168 

32 

4184.7 

258 

2585.5 

158 

4085.4 

254 

3210.9 

196 

64 

4301.0 

267 

2528.1 

154 

4332.8 

265 

3756.3 

229 

128 

4338.4 

264 

5009.1 

306 

4631.0 

290 

4778.7 

292 

256 

4744.9 

293 

3181.8 

194 

4954.9 

304 

4659.4 

284 


There are a number of differences between these results and those for a single disk: 


• With a single process, I/O rates are approximately the average of the drive transfer 
speeds. With only a single process, the test effectively accesses only one drive at a 
time. 

• With a second process, the random I/O performance almost doubles: there are suf¬ 
ficiently many disks that there is little contention. 

• With the second process, the “sequential” access does not increase significantly. 
This particular measurement is not easy to reproduce: depending on the random 
choice of start addresses for each process, the processes may end up contending for 
the same disks, or they may not. The following output of a number of repetitions 
of this test shows the problem: 


Random read Sequential read Random write Sequential write 

K/sec /sec K/sec /sec K/sec /sec K/sec /sec 


1644.9 

100 

1568.2 

96 

1656.6 

101 

1575.7 

96 

1402.5 

86 

1371.1 

84 

1598.5 

98 

1391.5 

85 

1473.4 

90 


1676.6 

102 

1543.7 

94 

1617.0 

99 

1674.4 

102 

1592.4 

97 

1549.8 

95 

1706.4 

104 

1622.9 

99 

1682.3 

103 


• Increasing the process count continues these trends: random I/O throughput in¬ 
creases significantly, while “sequential” I/O throughput is less predictable and 
tends to remain below the level of random I/O. 

• With a large number of processes, the random I/O throughput is much higher than 
any single drive can achieve. 
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Limitations and further directions 

rawio still has a number of limitations: 

• rawio does not pretend to understand file system behaviour. This is an area which 
needs further investigation. In particular, ufs superblock and inode updates can 
cause hot spots which are exacerbated by RAID-1 stripe sizes which are a power 
of 2. 

• rawio does not attempt to quantify disk behaviour. Many disks show markedly 
different performance characteristics over different parts of the surface, rawio 
should be able to recognize these performance bands and report on them. 

• rawio cannot always calculate the size of the disk partition correctly. As a result, 
it is possible that it will attempt to write outside the bounds of the partition, caus¬ 
ing errors like these: 

ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

adOg offset 5136940032, filesize 20524783104 

offset 18889022464, filesize 20524783104 

Child 7 Bad read at 18889022464: Invalid argument (22), iocount -1 
Child 6 Bad read at 18889022464: Invalid argument (22), iocount -1 

At the moment it is necessary to use the -s option to specify the size of the parti¬ 
tion if this problem occurs. 


Availability 

rawio is available for BSD platforms at ftp://ftp.lemis.com/pub/rawio/. It should be 
portable to most UNIX systems, but there are problems porting it to Linux, since Lin¬ 
ux does not have raw disk devices. 

Bibliography 

[Vinum] The Vinum volume manager, http://www.lemis.com/vinum.html. 
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instruction set design. He is also the author of “Porting UNIX Software (O Reilly 
and Associates, 1995), “Installing and Running FreeBSD” (Walnut Creek, 1996), and 
“The Complete FreeBSD” (Walnut Creek, 1997—1999). About the only thing he 
hasn’t done is writing commercial applications software. Browse his home page at 
http://www. letnis. com/~grog/. 
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inal instruments, exploring the Australian countryside with his family on their Arabian 
horses, or exploring new cookery techniques or ancient and obscure European lan¬ 
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NAME 

rawio - Test performance of low-level storage devices 

SYNOPSIS 

rawio [ -a] [ -c transfer-count) [ -f ] [ -h] [ -I name] [ -n record-count ] [ -p 
process-count] [-R][-r][-s size] [ -v verbosity] [-W [percentage]] [ -w 
[percentage]] special 

DESCRIPTION 

rawio tests the speed of the low-level character I/O device special in a concurrent environment. It is in¬ 
tended for comparisons of storage devices on a single system, and is not suited tor cross-plattorm perfor¬ 
mance testing. 

By default, rawio spawns eight processes, each of which performs the same test. Four tests are available: 

Random Read 

The random read test reads varying length records from the specified device special , starting at 
random positions within the file. The offset is necessary to protect the disk label and any possible 
future extensions. 

Sequential Read 

The sequential read test reads constant length records from the specified device special , starting 
at offset 32 sectors the beginning of the file. 

Random Write 

The random write test writes varying length records to the specified device special , starting at 
random positions within the file THIS TEST OVERWRITES DATA ON THE SPECIAL DE¬ 
VICE. DO NOT USE IT ON A DRIVE WHICH CONTAINS IMPORTANT DATA. 

Sequential Write 

The sequential write test writes constant length records to the specified device special , starting 
at offset 32 sectors the beginning of the file. The offset is necessary to protect the disk label and 
any possible future extensions. THIS TEST OVERWRITES DATA ON THE SPECIAL DE¬ 
VICE. DO NOT USE IT ON A DRIVE WHICH CONTAINS IMPORTANT DATA. 

If no tests are specified with the options -R, -r, -W or -w, rawio performs the random read and the se¬ 
quential read tests, which are non-destructive. 

rawio resembles bonnie in some of the things it does. It differs strongly from bonnie by using a raw 
disk device, which bypasses buffer cache. As a result, some of the tests that bonnie performs are meaning¬ 
less, for example character I/O. 

OPTIONS 

-a Perform all tests (Random Read, Sequential Read, Random Write, Sequential Write). 

-c transfer-count 

Specify the length of sequential transfers or the average length of random transfers. The length of 
a random transfer can be up to twice this value. It may be specified either in sectors (<5I2) or in 
bytes (>= 512), and must be an integral number of sectors. This value defaults to 16384 bytes (32 
sectors). The maximum value is system-dependent. On FreeBSD it is 256 sectors (131072 bytes). 

-f When performing sequential transfers, start all the transfers at the same offset into the device. This 
has a dramatic effect on the throughput. See INTERPRETING THE RESULTS below for a dis- 
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cussion of this flag. 

-h Suppress headings in the output. This can be useful to create output files intended for processing 
by plotting utilities. 

-I name 

Specify a name to be written in the results to identify the test. If this option is omitted, rawio us¬ 
es the name of the drive. 

-n record-count 

Specify the total number of records to transfer. The default value is 16384. If the number is not 
divisible by the number of processes, the first remainder processes transfer one extra record. 

The sequential transfer tests will stop at the end of the device, which may result in fewer records 
than indicated being transferred. 

-p process-count 

Specify the number of processes to start. The default value is 8. 

-R Perform a Random Read test, identified as RR in the output. This flag may be used in combination 
with other test specification flags. 

-r Perform a Sequential Read test, identified as SR in the output. This flag may be used in combina¬ 
tion with other test specification flags. 

-S When performing the random tests, use pseudo-random data from an internal table instead of call¬ 
ing the random number generator. This makes the results more predictable and thus more repeat- 
able. This can be of use when comparing results on different platforms, though it is not clear that 
the differences in the random numbers generated play a significant factor in the differences be¬ 
tween consecutive measurements. 

-s size 

Specify the size of special in bytes, rawio tries several different ways to determine the size of 
the device, but it is possible that all will fail. In this case, you should specify it manually. This 
value will override any attempt to determine the size programmatically, so it can also be of use to 
restrict the part of the device used by the random seek tests. 

-v verbosity 

Specify that more verbose output is desired, verbosi ty is an integer specifying the amount of 
information desired. Currently the only value which makes any difference is 1. 

-W [percentage] 

Perform a Random Write test, identified as RWin the output. This flag may be used in combination 
with other test specification flags. If you specify the optional percentage argument, this test 
will interleave read and write accesses, performing percentage % writes and (100 - 

percentage) % reads. THIS TEST OVERWRITES DATA ON THE SPECIAL DEVICE. 
DO NOT USE IT ON A DRIVE WHICH CONTAINS IMPORTANT DATA.. 

-w [percentage] 

Perform a Sequential Write test, identified as SWin the output. This flag may be used in combina¬ 
tion with other test specification flags. If you specify the optional percentage argument, this 
test will interleave read and write accesses, performing percentage % writes and (100 - 

percentage) % reads. THIS TEST OVERWRITES DATA ON THE SPECIAL DEVICE. 
DO NOT USE IT ON A DRIVE WHICH CONTAINS IMPORTANT DATA. 
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OUTPUT FORMAT 

rawio can produce three different styles of output. Without the -v option, rawio produces the following 
kind of output: 

Random read Sequential read Random write Sequential write 

ID K/sec /sec K/sec /sec K/sec /sec K/sec /sec 

rawdisk 81.0 5 149.4 5 88.1 6 129.1 4 


Each test produces only a single line of output. The values for each test are the throughput in kilobytes per 
second, and the number of transfers per second. 

With the -v 1 option, rawio produces the following style of output: 


Test 

ID 

K/sec 

%User 

%Sys 

%Total 

I/Os 

RR 

daOc 

1200.8 

o 

to 

2.4 

2.6 

800 

SR 

daOc 

944.0 

0.1 

0.7 

0.9 

800 

RW 

daOc 

1380.3 

0.2 

2.8 

3.0 

800 

SW 

daOc 

947.8 

0.0 

0.8 

0.9 

800 


The first column is an abbreviation for the test. See the test descriptions above. The second column is the 
identifier for the test, which in this example defaults to the name of the disk, daOc because none had been 
specified. 

The third column shows the aggregate data transfer speed. The fields %User , %Sys and %Total show the 
percentage user, system and total (user + system) CPU time used by the processes. The field I/Os shows 
the number of I/O requests issued to special. 

The verbose output prints the following information: 


Test name: sample 
Transfer count: 32768 
Record count: 100 
Process count: 8 


Device size: 1648000000 


Test 

ID 

Time 

K/sec 

%User 

%Sys 

%Total Reads 

Writes 

RR 

sample 

10.143686 

1243.1 

0.1 

2.6 

2 

.8 800 

0 

SR 

sample 

27.822950 

942.1 

0.0 

0.8 

0 

.9 800 

0 

RW 

sample 

9.470939 

1321.3 

0.1 

2.8 

2 

.8 0 

800 

SW 

sample 

27.719114 

945.7 

0.0 

0.9 

0 

.9 0 

800 

The time field shows the elapsed time for the complete test, and the columns Reads and Writes show 

itemized information about the I/Os. 

This format is likely to change and become more useful. 


Either format is intended to make it easy to extract test information from a log file. For example, to extract 

information on a 

specific test on different devices, you can enter: 




$ grep "RR stripe.l.log 







RR 

slk 

220.8 


0.0 

2.2 

2.2 

512 


RR 

s2k 

386.2 


0.1 

2.0 

2.0 

512 


RR 

s4k 

620.6 


0.2 

2.2 

2.3 

512 


RR 

s8k 

881.3 


0.2 

2.4 

2.6 

512 


RR 

sl6k 

1083.6 


0.2 

2.7 

2.9 

512 


RR 

s32k 

1192.2 


0.3 

2.5 

2.8 

512 


RR 

s64k 

1239.9 


0.1 

2.9 

3.1 

512 


RR 

sl28k 

1306.8 


0.4 

2.8 

3.1 

512 


RR 

s256k 

1346.0 


0.8 

2.5 

3.2 

512 


RR 

s512k 

1360.8 


0.0 

3.0 

3.0 

512 
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RR slm 1363.7 
RR s2m 1387.9 
RR s4m 1357.6 


To extract information on a specific test, 


$ grep sl28k stripe.l.log 


SR sl28k 

1719.1 

RR sl28k 

1306.8 

SW sl28k 

2126.9 

RW sl28k 

1516.4 


0.0 

3.2 

3.2 

512 

0.1 

2.8 

2.9 

512 

0.2 

2.8 

3.1 

512 


0.0 

1.7 

1.7 

512 

0.4 

2.8 

3.1 

512 

0.1 

2.1 

2.2 

512 

0.0 

3.4 

3.4 

512 


INTERPRETING THE RESULTS 

rawio is designed to simulate the behaviour of real-world storage devices in some common situations. 
When analysing the results, it is important to understand these situations. 

Random file access 

Relatively true random access situations, such as are demonstrated by the random read and write 
tests, occur with web page accesses. Many database applications also behave in this manner. In 
each case, the issue is complicated by directory or index access, so this test is idealized. 

In this test, the number of processes should be set to the approximate number of requestors. Per¬ 
formance will usually show an improvement with increasing number of requestors. This perfor¬ 
mance improvement will lessen with increasing number of processes, and may show a drop with a 
large number. This drop can be attributed to a large number of factors, not the least of which is 
natural measurement error. 


Sequential file access 

True sequential file access occurs when the disk subsystem reads sequential blocks off a disk. It is 
extremely rare in a large system: it implies that only one process is doing the reading. As soon as 
two processes read, even if they are reading the same file sequentially, the access is no longer pure¬ 
ly sequential as seen by the disk. Instead, multiple read requests are issued for the same spatially 
related blocks. 

Even this kind of access is relatively rare. First, the file system buffer cache will generally resolve 
these issues, so only one read will be issued. Secondly, typical “sequential access” is more typi¬ 
fied by an ftp server or streaming video server, where multiple processes read relatively large files 
in a sequential manner. This is the model which the rawio sequential access tests perform by de¬ 
fault. If you want to test multiple sequential access to the same area of disk, use the -f flag. 

The normal sequential access tests show a marked decrease in performance with increasing num¬ 
ber of requestor processes. This is because the access becomes more and more random with in¬ 
creasing number of requestors. On the other hand, with the -f flag, performance improves dra¬ 
matically with increasing number of requestors, since now the on-device cache can satisfy most re¬ 
quests. With 8 requestors, performance improvements of 3000% to 4000% can be expected. This 
is the scenario that RAID array vendors like to show, since it can show really dramatic perfor¬ 
mance. Unfortunately, the figures are almost meaningless. 

The effect of transfer size 

Modern storage devices transfer data at between 10 MB/s and 80 MB/s. A typical transfer of 8 kB 
thus takes between 100 t/s and 800 us. By contrast, typical positioning latency is in the order of 8 
ms, between 10 and 80 times as long. Obviously the size of the transfer strongly affects the 
throughput. Unfortunately, it is often difficult to influence the transfer size. Text web pages, for 
example, tend to be less than 16 kB in size, ftp files and image data are larger, but it’s often diffi¬ 
cult to persuade the system to transfer in larger quantities; a lot depends on the program perform¬ 
ing the access. It’s beyond the scope of this man page to discuss methods of improving perfor- 
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mance, but rawio provides the mechanism for measuring the potential differences. A typical av¬ 
erage transfer size for uf s is between 6 kB and 7 kB. 


GOTCHAS 

rawio measures I/O system performance, so you should use it against the raw disk device. It will work 
against block devices, but you’ll be measuring the performance of buffer cache, not the underlying device. 


The rawio write tests overwrite the data on the device. Don’t use them on devices containing data you 
care about. 

SEE ALSO 

bonnie(l), iozone(l), iostat(8) 


AUTHOR 

Greg Lehey <grog@lemis.com>. 
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Preventing the Unauthorised Binary 

Brett Lymn <blymn@baesystems.com.au> 

January 19, 2000 

ABSTRACT 

This paper will demonstrate a method of ensuring the binaries running on a Unix sys¬ 
tem have not been subverted or modified and that unauthorised binaries can be pre¬ 
vented from executing even if the root account is being used. This method also allows 
sophisticated shell interpreters to be safely installed by preventing the interpreter be¬ 
ing run directly by exec. This paper is a description of work in progress and some fea¬ 
tures described may change. 

1. Introduction 

At the time of writing, the current trend of system attack is gain control of many victim machines and use 
the resources of these victims to disrupt the services on a third party system. Sophisticated distributed 
Denial of Service (DoS) tools such as Stacheldraht[l], Trinoo[2], Tribal Flood Network[3][4] and others[5] 
that are being deployed by attackers to perform DoS attacks against various targets. 

Part of the reason as to why these tools are widely deployed is the poor set up and maintenance of machines 
connected to the Internet. Basic precautions for securing machines has not been done and auditing of the 
machines is not performed. Part of the reason, also, is that a lot of these machines are UNIX systems and 
once the attacker has gained root privileges they have total control of the victim system. This means that 
they can install and run any binaries they choose or modify current system binaries to perform extra func¬ 
tions - such modified binaries are called Trojan Horses and have been commonly used to log user pass¬ 
words to a file or provide a root shell on demand. 

Poor system setup is a matter of educating administrators about the correct manner in which to secure a 
system. It seems amazing that this still needs to happen given the sheer volume of information available 
both commercially and freely but evidence shows that the education process needs to continue. 

A method of preventing unauthorised binaries running and detecting Trojan horses will allow the conscien¬ 
tious administrators to more easily audit their systems and have confidence in the binaries they run. There 
are already methods of doing this but they do have some difficulties and this paper attempts to show a new 
approach to the problem. 

2. Prior Art 

Probably the most well known tool for auditing a running system is a tool called tripwire[6]. This tool is 
quite old now but it is still useful in auditing a system. Tripwire validates the signatures of files in it’s con¬ 
trol file against the files on the running system and flags any files that do not match. Multiple fingerprinting 
methods can be employed depending on the administrators tastes as well as checking the file modification 
times. Due to the nature of tripwire there are aspects of it that are difficult to secure properly so that the 
results can be trusted. This difficulty may be part of the reason why tripwire is not more widely used. 
Some of the problems with tripwire are: 

1. Tripwire runs as a user level process and as such there is nowhere that it can securely cache data 
between runs. It also means that an attacker may be able to subvert the running of tripwire to either 
prevent it running or run it with a subverted configuration. 

2. It is difficult to assure that the tripwire binary being run is the true, correct version and that the 
binary has not been manipulated by an attacker. 

3. Tripwire can be computationally expensive to run. The degree of expense depends on the number 
of differing fingerprinting methods configured and the fingerprint algorithms used. Normally, there 
is a trade off between the number of times a day tripwire is run and the processor load. A machine 
that continually runs tripwire may be too slow to perform real work. This trade off means that there 
is a window of opportunity for an attacker to break into a machine and use that machine for what¬ 
ever she wants before tripwire runs again and, possibly, flags a problem. 


Preventing the Unauthorised Binary 


291 



AUUG2K - Enterprise Security, Enterprise Linux 


4. Depending on the configuration, reports from tripwire can be very verbose about changes to the sys¬ 
tem it is run on. Normally, parts of the machine are deliberately not scanned or only file status is 
checked. These blind spots may be used by an attacker to store their toolkit on unnoticed. 

Some of these points can be addressed by very careful configuration and strict adherence to procedures for 
maintaining the tripwire configurations and databases. 

3. Other Methodologies 

In the release of 4.4BSD there is a kernel securelevel variable that affects the behaviour of the kernel when 
it is set to various values. In some 4.4BSD derived systems there is also the concept of file flags that extend 
the normal file permission semantics to include the ability to make a file immutable (unable to be changed), 
append only, opaque, archived or nodump. At securelevel 0 normally file permissions apply to device files 
and the immutable and append only flags can be turned off. At securelevel 1 immutable and append only 
flags cannot be changed, /dev/kmem, /dev/mem and raw devices of mounted file systems are made read 
only. Securelevel 2 is the same as securelevel 1 but raw devices are always read only. 

With these features it is possible, in combination with judicious use of the read-only and no-exec mount 
options to build a system that has all the binaries made immutable and mounted on read only partitions. 
The log files and other temporary files on partitions that are mounted no-exec. A machine such as this is 
not impossible to build but requires some painstaking setup. One problem with this setup is that not all 
software respects the clear separation of writable files from executable binaries. Also some programs, for 
example, the ISC dhcp client runs programmatically written shell scripts which, in itself, is not a bad thing 
but it does mean that there needs to be some writable place that also allows execution if you want dhcp to 
work. This forces a breach in the system that can be used by an attacker. DHCP is not the only case of 
this, some programs such as syslog expect to be able to rewrite /dev to place a listening socket there. Sys- 
log is can be told to put this socket elsewhere but some other programs may not be as cooperative. 

To cater for the exceptions there is the temptation to leave the partitions writable and just make the binaries 
immutable. This can lead to a false sense of security because unless the locking process is done extremely 
carefully the attacker will be able to move aside the immutable binary and place their own Trojan horse in 
it’s place. Unless a tool like tripwire is used in this situation there is no assurance that the binaries have not 
been manipulated due to exploitation of a set up error. There is also no method of preventing the execution 
of binaries loaded onto the system. 

4. A Different Way 

After some discussions on the Bugtraq[7] mailing list an analysis of the NetBSD[8] kernel sources was per¬ 
formed to see if the checking of executables could be made more stringent by performing a cryptographi¬ 
cally strong fingerprinting of the executable. It was found that, ultimately, all exec calls go through a single 
execve call and that in this call there was a function called check_exec that it is used to validate the file to 
ensure that it can be executed. The check_exec function can be extended to include more checks and these 
checks will apply to any exec made on the system. 

The NetBSD check_exec routine was modified to perform a md5 fingerprint of the executable that was 
going to be exec’ed. The md5 fingerprint was chosen because the code for evaluating the fingerprint was 
already in the kernel and it was simply a matter of modifying check_exec to make the appropriate calls. A 
pseudo-device driver was added that allowed a user level program to load a list of fingerprints into the ker¬ 
nel memory space, the driver resolved the filename string passed into a device node and i-node and stored 
the information in a simple linked list in the kernel memory space. Loading of fingerprints into kernel 
space is denied when securelevel is above 0. 

With the modifications, when an exec was performed the check_exec routine evaluated the md5 fingerprint 
of the given file and then searched the stored list of fingerprints. If the device node and i-node of the 
executable was found in the stored list then the evaluated fingerprint was compared against the stored one. 
If the fingerprints match then check_exec returns success. If the file cannot be found on the stored list or 
the fingerprint does not match then the behaviour of check_exec depends on the securelevel. A new 
securelevel was introduced, securelevel 3. At this securelevel the check_exec call will return a permission 
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denied error for any file that does not pass the md5 fingerprint check. For securelevels below 3 an error was 
logged to the console for any file that fails the md5 check but check_exec returned success. This allows the 
system to startup without a fingerprint list loaded, this needed to be done otherwise there would be no way 
of booting the system. Once the fingerprints were loaded the securelevel could be raised to prevent an 
attacker installing more signatures. 

Performing the fingerprint calculation every time had a severe impact on the performance of the machine. 
A full make of a NetBSD kernel took 1.7 times longer with the fingerprint enabled than it did without. This 
was a noticeable slow down. There appeared to be three things that could be done to address this slow 
down: 

1. Do nothing. Hardly technically satisfying but it could be argued that the extra security was worth 
the slow down. Besides which the intention was that fingerprinting would only be done on firewall 
type machines where most processes are long running. 

2. Buy a faster machine. Making the problem go away by throwing money at it is valid but not usually 
popular. 

3. Implement some caching scheme so the fingerprints need not be evaluated every time. Because the 
fingerprinting code is operating inside the kernel it is safe to assume the cached data cannot be mod¬ 
ified. 

Option 3 was chosen and research started on how best to cache the data. The fingerprint cache needs to be 
fast to look up, keep the associated path of the binary, have a high hit rate and be flushed when the associ¬ 
ated file is modified or deleted. As it turned out there was already a cache that did all of these things in the 
NetBSD kernel. It was found that the Directory Name Lookup Cache (DNLC) already caches a lot of use¬ 
ful information, including the v-node, about a file and that the cache was already doing all the functions 
required for the caching of the fingerprint data. 

The v-node kernel structure was modified to include a single byte fingerprint status, because the NetBSD 
kernel attempts to keep v-node data as long as possible in case the v-node is used again this operates as an 
effective cache for the fingerprint status of the file. A single byte is used to keep the fingerprint status 
rather than keeping the entire fingerprint, this saves memory and means that a single byte only needs to be 
checked to verify the fingerprint status of a file. The fingerprint is validated against the loaded list value 
just after it is calculated and the status byte set to indicate the results of the comparison between the calcu¬ 
lated and stored values. This fingerprint status is only invalidated when the v-node is cleaned up for use by 
another file. So, as long as there are no activities, such as a find on a filesystem, that cause a run of the v- 
node cache the v-node fingerprint status will be available for later invocations of the same executable. This 
results in a large reduction in the amount of time spent evaluating the fingerprints on executables by the ker¬ 
nel. Measurements of a make of the NetBSD kernel show that with caching the fingerprints the time to run 
the make took about 5% longer with the signed executables than it did without them. From this it can be 
seen that caching the fingerprint results makes a large positive impact on the performance of the system. 

5. Handling Shell Interpreters 

The way that a shell script exec is handled by the NetBSD kernel is different to a normal executable. When 
exec is called on a shell script, check_exec is called to verify the file is a candidate for execution. If 
check_exec returns success then the appropriate exec handler for the binary type is selected and executed. 
In the case of a script this exec handler is exec_script. The exec_script handler parses the header of the 
script and extracts the script interpreter and then calls check_exec to verify the interpreter is valid for 
execution. This two step process presents an interesting opportunity to treat the shell interpreter differently 
to a normal exec. 

The NetBSD kernel check_exec routine was modified to pass an extra parameter to check_exec. This 
parameter was a flag that indicated whether or not check_exec was being called from execve itself or from 
an exec handler. The only exec handler that used check_exec was exec_script, this was modified to include 
the new parameter. The signed exec pseudo-device structure was modified to include an extra flag and the 
signature loader changed to support the flag. This flag is set to indicate when only indirect calls of the 
binary are allowed, that is, the binary is only allowed to be a script interpreter, any direct execution request 
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of the binary will be denied. 

It is important to note that both the shell script and it’s interpreter are subjected to validation of their finger¬ 
prints in check_exec, regardless of the status of the indirect execution flag. This provides assurance that the 
script nor the interpreter have been tampered with. The indirect execution flag give the opportunity to 
install a powerful interpretive language, for example Perl, that can only be used to run verified scripts 
removing the risk that the languages capabilities will be used by an attacker. 

An interesting application would be to make the binary /bin/sh a candidate for indirect execution. Careful 
checking would be needed to ensure no executables attempted to exec /bin/sh directly but if this was done 
then attempts to break into the machine by manipulating a buffer overflow to execute /bin/sh would be 
thwarted. This is not a complete solution as the attacker could attempt to exec another shell but this can be 
addressed which serves to "raise the bar" on the system security by preventing many scripted attacks from 
working and it does make it difficult to tell if a remote exploit failed to work because the buffer overflow 
failed or the shell exec was denied. 

6. Problems with signed execution 

There are some things that performing a signed executable exec check will not prevent, one of these is 
detecting the manipulation of shared libraries. There is nothing preventing an attacker down-loading a 
shared library that has a Trojan function that will be activated when particular arguments are passed to it. 
This sort of exploit is a bit more complex to implement but is achievable. The attackers do run the risk of 
making other programs malfunction by accidentally triggering the exploit but sufficient imagination would 
make this unlikely. 

The signed executable exec check does not protect against an attacker executing code on the processor 
stack via a buffer overflow or similar. As described above, this type of exploit can be made more difficult 
by making the exec of the common shells indirect which means all scripts will still function but an attempt 
to exec a shell would fail. Login shells would need to be placed in an obscured location to allow logins to 
the system. This approach is a security by obscurity method and should not be trusted as a security mea¬ 
sure in itself. If an attacker can find out where the executable shells are located then the buffer overflow 
can be modified to use the new location. So, this method is only secure as the information about the loca¬ 
tions of the executable shells. 

The signed exec fingerprints are, currently, stored in a file and loaded during boot in a rc script. The startup 
file, the fingerprints and the signed exec fingerprint loader all need to be protected as they now form the 
crux of the security of the system. These can be protected by making the files immutable or similar. Trip¬ 
wire can also be used as a defense against manipulations of the system, since the tripwire binary can now 
be verified as the correct one this can be used to scan for tampering on the system. Note that now that the 
binaries are automatically checked by the kernel the work that tripwire has to do is much reduced giving the 
possibility that tripwire could be run more frequently than would otherwise be practical. 

7. Possible Applications 

Signed executables can be used in any application that has a defined set of executables that need to be run 
and these executables do not change often. Some possible scenarios are firewalls, routers and secure work¬ 
stations where only approved binaries are to be run. The signed executable exec is not suited to being run 
in an environment where users need to execute code that is constantly changing, a software development 
environment would be such a case. 

The signed executable fingerprint file can also be used as a method of securely distributing the operating 
system to end users. Once the user has installed the operating system they could request an encrypted copy 
of the fingerprint file. This fingerprint file could then be decrypted using a public key decryption and the 
resulting file used to protect the new system. This provides a secure method of ensuring the original distri¬ 
bution was not tampered with. 


294 


Preventing the Unauthorised Binary 


AUUG2K - Enterprise Security, Enterprise Linux 


8. Conclusion 

The signed executable exec kernel modification provides a visible method of verifying the executable that is 
being run is the correct one and has not been tampered with. This provides a level of trust in the executa¬ 
bles that is difficult to attain by other means. Due to the operation at the kernel level important speed gains 
can be made by using the kernel memory protection mechanisms to protect cached data from tampering. 
This kernel modification also gives a fine level of control over what can be executed on the system, even by 
root, a capability that was not available before. Also, the addition of the indirect execution check means 
that powerful interpreters can safely be installed in security critical situations with the knowledge that the 
interpreter cannot be used to run unsigned scripts. This kernel modification does not provide a total secu¬ 
rity solution but should be treated as another tool in the chest that can be used to increase the security of an 
operational system. 

9. Further Work 

Work on the signed executable exec kernel modification is still in progress to address some issues. Some 
things that can be looked at are: 

• Modify the method of loading the file signatures. One approach may be to have the signatures in a 
file in the mount point of the file system. The mount program would need to be modified to open this 
file, mount the file system and then feed the signatures into the kernel. By doing this the signature 
file is automatically hidden as soon as the file system is mounted, hence the signature file is immune 
from tampering. The only difficulty is handling the root file system where the signature file will still 
be exposed. 

• Find a way of applying a similar method of signing to shared libraries. 

• Decrease the impact of looking up the fingerprints by performing a sorting of the in-kernel fingerprint 
list. Currently this list is unsorted and a linear search is performed when the fingerprint is looked up. 
This is not optimal and should be addressed to improve the performance. One simple scheme would 
be to move the fingerprint list entry to the head of the list when it is accessed. This would have the 
effect of rippling the most frequently used fingerprints to the top of the list thus reducing the search 
time. 

• Make the signed fingerprints independent of the ufs filesystem. Currently, the fingerprint list holds a 
device/i-node pair for each executable. This means that other file systems cannot be used currently. 
A better approach may be to use the file generation number which is not ufs specific. 
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Footnote Presentation 
Samba 3.0 Internals 
Andrew Tridgell 


Biography 

Andrew Tridgell is probably best known as the author of Samba, Andrew also works 
on a number of other pieces of free software including rsync, JitterBug, KnightCap 
and the ports of Linux to the Fujitsu AP series of parallel computers. 

Holding an unusual place in the open source world. Samba lives on the boundary 
between the traditional Microsoft dominated computing world and the emerging open 
source community. It is not uncommon for Samba to be the first open source 
application to be used by a company, thus opening the way to the open source 
revolution. 
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Samba 3.0 Internals 

tridge @ linuxcare.com 


Development of Samba 3.0 is well underway, with the release expected sometime in 2000. This 
paper describes some of the key features of Samba 3.0, concentrating on the internal design 
decisions. 

• New internal database system 

• Auto-generated parse code 

• Unicode rewrite 

• PDC Support 

• Loadable VFS system 

• NSS interfaces to Samba 

• Library version of Samba code 

• New documentation 

• spoolss support for NT printing 

• New printing backend 

• New memory handling system 

• Auto-diagnosis system 

• Current status 
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New Internal Database System 


Triggered by a requirement in the printing code, a re-evaluation of the internal data 
structures in Samba was made. The result was that a new database oriented way of 
representing many internal data structures was adopted. 

A simple database module called tdb was written that has some nice features: 

• Multiple simultaneous writers 

• Very fast operation 

• persistant storage 

• Arbitrary keys and data 

The interface is similar to gdbm, but overcomes the limitation in gdbm that restricts the 
database to a single writer. 
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New databases 


In Samba 3.0 tdb databases will be used for the following: 

• Internal locking code (previously in shared memory) 

• Connection tracking (previously in ascii-files) 

• WINS database (previously in splay trees) 

• Unexpected packet database (previously unsupported) 

• Ipq cache (previously in temp files) 

• oplock break queue (previously used UDP packets) 

• SID mapping (not previously supported) 

• smbpasswd database (previously in a ascii file) 

A number of other uses for tdb databases are being considered. The code may also be used in 
some applications outside Samba. The code is not tied to Samba. 
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Auto-generated parse code 


For Samba 3.0 we are moving towards auto-generated parse code. We have written a AWK 
based IDL compiler and are starting to write IDL files for the MSRPC protocols. 

The eventual aim is to auto-generate all protocol parsing code in Samba, including the main 
SMB code. 
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Unicode Rewrite 


One of the biggest changes in Samba 3.0 will be the rewrite to use Unicode strings throughout 
Samba. While the SMB protocol includes negotiation options that allow Samba to negotiate 
non-unicode operation we have found that a combination of non-unicode bugs in NT and 
uniform handling of internationalisation has forced us to rewrite Samba to use Unicode 
throughout. Jeremy Allison is doing most of the work on the Unicode rewrite. 

One big advantage of this rewrite will be that bugs involving multibyte character sets should 
be found much faster, as they will affect the primary Samba developers (who all speak single 
byte encoded languages). 
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PDC Suppport 


The long awaited full PDC support will make its debut in Samba 3.0. It has been mostly 
prototyped by Luke Leighton in the experimental SAMBA_TNG branch and now just needs 
to be integrated into 3.0 for production use. 

The code allows Samba to participate as a full PDC or BDC in a NT domain. Most of the 
work in done by a separate msrpcd daemon, communicating with smbd via a unix domain 
socket. 

Right now the MSRPC code is implemented by hand in C, but some work is being done to 
reimplement it via a code generator. 
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VFS Support 


The VFS code implemented by Tim Potter is being integrated into Samba 3.0. This code 
allows Samba to dynamically load alternative file 10 implementations, replacing the Posix IO 
layer normally used. 

The VFS system was initially developed to allow easy integration of Samba with a large tape 
silo system, but will find many other uses, including a clean way of doing automatic cr/lf 
translation. 

A Perl binding for the VFS system has also been implemented, although it is unlikely to see 
widespread use! 
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NSS support 


While successful as a server Samba still lags in its ability to seamlessly integrate a Unix 
machine as a client in a NT network, smbfs provides basic client filesharing, but does not 
address name resolution or authentication. 

To address this a number of NSS modules are being developed that allow a NSS aware client 
(such as Linux) to use a SMB server for hostname resolution, authentication and user/group 
enumeration. 

The hostname resolution (WINS) NSS module is complete, with the other NSS modules in the 
design stage. 
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libsamba 


An increasing number of packages are using capability from Samba for authentication or 
SMB file access. To make maintainence easier it is planned to produce a library version of the 
core code in Samba for Samba 3.0. 

The principle problem faced will be the need to clean up the name space used in Samba to 
prevent name space pollution and conflicts with other applications. 
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New documentation 


Thanks to O’Reilly releasing their book "Using Samba" under an open content license, we 
now have a new documentation resource for Samba. The Samba Team has adopted this book 
and will be keeping it up to date as changes are made in Samba. 

This fills a large gap in the Samba documentation, previously we had nothing that could be 
called a user guide, which left users without an in depth technical resource to refer to. 
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NT printing support 


NT uses quite a different printing model to Win9X. Called Spoolss, it is based on MSRPC and 
uses a very complex and undocumented set of remote procedure calls to implement all aspects 
of NT printing. 

Up till now Samba has used a trick to avoid the necessity of reverse engineering spoolss. The 
trick caused NT workstations to fall back to Win9X style printing on Samba servers. This 
prevent some printing features from working with NT clients. 

Thanks to the efforts of Jean Francois Micouleau, Samba now has a full spoolss 
implementation, which will debut in Samba 3.0 on top of the new MSRPC implementation. 
This will provide the full range of NT printing capabilities. 
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new printing backend 


The dependence of some WinXX applications on a match between printing IDs during a print 
spool operation and during subsequent print queue query operations led to the development 
of the new tdb database module. The motivation was to provide a persistant mappping 
between internal spooling structures and the underlying print system. 

For Samba 3.0 a new printing backend that uses an internal print queue database will be 
implemented. This will solve a number of problems with the existing printing backend, that so 
far have required nasty hacks to work around. 
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new memory handling system 


A nasty hack in the parameter handling code in Samba came back to haunt us in early 2000 
when a bug was observed that resulted in incorrect name query requests being sent during 
domain authentication. 

The bug was caused by some invalid memory handling in lp_string. The fix required a new 
pool based memory allocator, which while very simple in itself provide a useful new internal 
tool in Samba. It is expected that the new system, called talloc, will be used extenstively in the 
MS RPC code. 

tallloc provide for a context based temporary memory allocator which allows alll memory 
associated with a context to be freed in one go, removing the need for a lot of cumbersome 
bookkeeping. 
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auto diagnosis system 


SWAT, the Samba Web Administration Tool, has made Samba much easier to administer for 
a large number of users. The next step in the development of SWAT is the addition of a 
auto-diagnosis system that will go through the steps for diagnosing a Samba server and 
provide a single page report on the status of the system. 

This should make diagnosing Samba errors much easier, and will also help in making bug 
reporting more consistent. 
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Current Status 


Many of the features for Samba 3.0 are already complete, but some require a lot more work. 
The following gives the approximate status in January 2000. 

• New internal database system - COMPLETE 

• auto-generated parse code - STARTED 

• Unicode rewrite - PARTLY COMPLETE 

• PDC Support - PARTLY COMPLETE 

• Loadable VFS system - COMPLETE 

• NSS interfaces to Samba - COMPLETE 

• Library version of Samba code - NOT STARTED 

• New documentation - COMPLETE 

• spoolss support for NT printing - COMPLETE 

• New printing backend - COMPLETE 

• New memory handling system - COMPLETE 

• Auto-diagnosis system - NOT STARTED 
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